Following our complaints analysis last week, Chin & Beard Suds Co. (still our mythical organisation) is looking to manage its marketing mailing lists. Sadly, our customers are not just sporadically complaining, they are also choosing not to receive of marketing. We are continually releasing new scents in our products and we want to let our customers know.

Sadly for us, our website has an unsubscribe button that only let's people enter their First and Last Name. It does capture the date they want to unsubscribe so they can resubscribe at a later date. Our mailing list is a list of emails that are consistent enough that we can join these two data sets together, but not easily.

The business needs to understand not just who they can market too but also, how much revenue we are losing by our customers not showing interest in us. Luckily, we have the raw data to help us understand this but:

We want to have a nice list of emails that we CAN still market to (and include if they have unsubscribed and resubscribed as we might have to handle that)
We want a summarised dataset that lets us understand when customers unsubscribe, how much they have spent with us, do they resubscribe, what products are they interested in. Keep hold of Subscribed and Resubscribed Customers too for context in our analysis.
Requirements

<img src="https://1.bp.blogspot.com/-YeoSzwI__Pk/XLcpFzRoHMI/AAAAAAAAAM4/2_JDvVtSeYEUFC8BXEcKh8buE1Rx7Bz0wCLcBGAs/s400/Mailing%2BList%2Binput.JPG" alt="Alternative text" />

*note - the Liquid / Bar 1/0 indicators do not matter in the analysis*

## Requirement

* Input data - all three sheets
* Join the Mailing List to the Unsubscribe List to determine who can still receive our marketing messages
* Group the customers in to the following Status groups: Subscribed, Resubscribed and Unsubscribed.
* And, group the customers in to the following groups of their tenure on the mailing list: 0-3, 3-6, 6-12, 12-24, 24+ months
* Add in Customer Lifetime Value to understand our revenue from each customer
* Create two outputs as detailed in the numbered bullets above (Email list and Analysis of Unsubscribed Customers)

## Output
1. Summarised Data Set
* 23 rows (24 including headers)
* 7 columns
2. Refreshed Mailing List
* 100 rows (101 including headers)
* 8 columns

<img src="https://1.bp.blogspot.com/-UFPCgzsELnI/XLdinnElQ7I/AAAAAAAAANE/6hvo7HL0ADQ-WeRvfDQ2f9BqLx5bveD4gCEwYBhgL/s1600/Analysis%2Bdata%2Bset%2Boutput%2B2.JPG" />

Summarised Dataset


<img src="https://2.bp.blogspot.com/-gQn-VLrs8mI/XLdinkvApSI/AAAAAAAAANI/KuJZxbJTGAoNJPVWS9yhZfJ9bYcxEl6jgCEwYBhgL/s400/Mailing%2BList%2Boutput%2B2.JPG" />

Refreshed Mailing List


In [1]:
import pandas as pd
import re
from dateutil.relativedelta import relativedelta

In [2]:
input_file = 'input.xlsx'
df1 = pd.read_excel(io=input_file, sheet_name='Mailing List 2018')
df1['key'] = df1['email'].apply(lambda x: re.findall('[a-zA-Z]+', x)[0])
df1['fixed_key'] = df1['key'].str.slice(stop=5)
print(df1.head(5))
print(df1.dtypes)

                      email  Liquid  Bar Sign-up Date        key fixed_key
0       dmalone1@tumblr.com     1.0  0.0   2016-01-14    dmalone     dmalo
1           dsworder6@is.gd     0.0  1.0   2016-12-11   dsworder     dswor
2  bandresen8@rakuten.co.jp     1.0  1.0   2016-11-07  bandresen     bandr
3       cmiskin9@dion.ne.jp     1.0  1.0   2017-07-30    cmiskin     cmisk
4    tmenguyb@people.com.cn     0.0  1.0   2016-10-30   tmenguyb     tmeng
email                   object
Liquid                 float64
Bar                    float64
Sign-up Date    datetime64[ns]
key                     object
fixed_key               object
dtype: object


In [3]:
df2 = pd.read_excel(io=input_file, sheet_name='Unsubscribe list')
df2['Date'] = pd.to_datetime(df2['Date'])
df2['first_name'] = df2['first_name'].str.lower() 
df2['last_name'] = df2['last_name'].str.lower()
df2['key'] = df2['first_name'].str.slice(stop=1) + df2['last_name']
df2['fixed_key'] = df2['key'].str.slice(stop=5)
print(df2.head(5))
print(df2.dtypes)

  first_name last_name       Date        key fixed_key
0   donielle    malone 2018-04-23    dmalone     dmalo
1      dorry   sworder 2018-12-26   dsworder     dswor
2   benedict  andresen 2018-12-24  bandresen     bandr
3    cornell    miskin 2018-10-12    cmiskin     cmisk
4  theresita    menguy 2018-08-28    tmenguy     tmeng
first_name            object
last_name             object
Date          datetime64[ns]
key                   object
fixed_key             object
dtype: object


  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(df2['Date'])
  df2['Date'] = pd.to_datetime(d

In [4]:
df3 = pd.read_excel(io=input_file, sheet_name='Customer Lifetime Value')
print(df3.head(5))
print(df3.dtypes)

                      email  Liquid Sales to Date  Bar Sales to Date
0       dmalone1@tumblr.com                9380.0             3927.0
1           dsworder6@is.gd                8731.0             4300.0
2  bandresen8@rakuten.co.jp                9655.0              611.0
3       cmiskin9@dion.ne.jp                8371.0             1814.0
4    tmenguyb@people.com.cn                 567.0             1300.0
email                    object
Liquid Sales to Date    float64
Bar Sales to Date       float64
dtype: object


In [5]:
df1_join_df2 = df1.merge(df2, on='fixed_key', how='left')

def get_status(row):
    if pd.isnull(row['key_y']):
        return 'Subscribed'
    elif row['Date'] > row['Sign-up Date']:
        return 'Unsubscribed'
    else:
        return 'Resubscribed'
    
def months_diff(row):
    if row['status'] == 'Unsubscribed':
        delta = relativedelta(row['Date'], row['Sign-up Date'])
        return delta.years * 12 + delta.months
    else:
        return None
    
def life_time_group(row):
    if row['months_diff'] < 3:
        return '0-3'
    elif row['months_diff'] < 6:
        return '3-6'
    elif row['months_diff'] < 12:
        return '6-12'
    elif row['months_diff'] < 24:
        return '12-24'
    elif row['months_diff'] >= 24:
        return '24+'

df1_join_df2['report_date'] = pd.to_datetime('April 17, 2019')
df1_join_df2['status'] = df1_join_df2.apply(get_status, axis=1)
df1_join_df2['months_diff'] = df1_join_df2.apply(months_diff, axis=1)
df1_join_df2['months_before_unsubscribed_group'] = df1_join_df2.apply(life_time_group, axis=1)
df1_join_df2.rename(columns={'Liquid': 'Interested in Liquid Soap', 'Bar': 'Interested in Soap Bars'}, inplace=True)
print(df1_join_df2.head(5))
print(df1_join_df2.dtypes)

                      email  Interested in Liquid Soap  \
0       dmalone1@tumblr.com                        1.0   
1           dsworder6@is.gd                        0.0   
2  bandresen8@rakuten.co.jp                        1.0   
3       cmiskin9@dion.ne.jp                        1.0   
4    tmenguyb@people.com.cn                        0.0   

   Interested in Soap Bars Sign-up Date      key_x fixed_key first_name  \
0                      0.0   2016-01-14    dmalone     dmalo   donielle   
1                      1.0   2016-12-11   dsworder     dswor      dorry   
2                      1.0   2016-11-07  bandresen     bandr   benedict   
3                      1.0   2017-07-30    cmiskin     cmisk    cornell   
4                      1.0   2016-10-30   tmenguyb     tmeng  theresita   

  last_name       Date      key_y report_date        status  months_diff  \
0    malone 2018-04-23    dmalone  2019-04-17  Unsubscribed         27.0   
1   sworder 2018-12-26   dsworder  2019-04-17  U

In [6]:
merge = df3.merge(df1_join_df2, on='email')
merge.drop(['key_x', 'key_y', 'fixed_key', 'months_diff'], axis=1, inplace=True)
print(merge.head(5))
print(merge.dtypes)

                      email  Liquid Sales to Date  Bar Sales to Date  \
0       dmalone1@tumblr.com                9380.0             3927.0   
1           dsworder6@is.gd                8731.0             4300.0   
2  bandresen8@rakuten.co.jp                9655.0              611.0   
3       cmiskin9@dion.ne.jp                8371.0             1814.0   
4    tmenguyb@people.com.cn                 567.0             1300.0   

   Interested in Liquid Soap  Interested in Soap Bars Sign-up Date first_name  \
0                        1.0                      0.0   2016-01-14   donielle   
1                        0.0                      1.0   2016-12-11      dorry   
2                        1.0                      1.0   2016-11-07   benedict   
3                        1.0                      1.0   2017-07-30    cornell   
4                        0.0                      1.0   2016-10-30  theresita   

  last_name       Date report_date        status  \
0    malone 2018-04-23  2019

In [8]:
output1 = merge.groupby(['status','months_before_unsubscribed_group', 'Interested in Liquid Soap', 'Interested in Soap Bars'], dropna=False).agg({'email':'count', 'Liquid Sales to Date':'sum', 'Bar Sales to Date':'sum'}).reset_index()
print(output1.head(5))
print(output1.dtypes)

         status months_before_unsubscribed_group  Interested in Liquid Soap  \
0  Resubscribed                              NaN                        0.0   
1  Resubscribed                              NaN                        1.0   
2  Resubscribed                              NaN                        1.0   
3    Subscribed                              NaN                        0.0   
4    Subscribed                              NaN                        0.0   

   Interested in Soap Bars  email  Liquid Sales to Date  Bar Sales to Date  
0                      1.0      1                7244.0             2690.0  
1                      0.0      4               18081.0            15290.0  
2                      1.0      1                8056.0             3454.0  
3                      0.0      1                6761.0              201.0  
4                      1.0     34              144420.0            81908.0  
status                               object
months_before_unsub

In [13]:
# output2 = merge.groupby(['status', 'email', 'Interested in Liquid Soap', 'Interested in Soap Bars', 'Sign-up Date', 'Date', 'Liquid Sales to Date', 'Bar Sales to Date'], dropna=False)
output2 = merge.loc[:, ['status', 'email', 'Interested in Liquid Soap', 'Interested in Soap Bars', 'Sign-up Date', 'Date', 'Liquid Sales to Date', 'Bar Sales to Date']]
print(output2)

           status                      email  Interested in Liquid Soap  \
0    Unsubscribed        dmalone1@tumblr.com                        1.0   
1    Unsubscribed            dsworder6@is.gd                        0.0   
2    Unsubscribed   bandresen8@rakuten.co.jp                        1.0   
3    Unsubscribed        cmiskin9@dion.ne.jp                        1.0   
4    Unsubscribed     tmenguyb@people.com.cn                        0.0   
..            ...                        ...                        ...   
195    Subscribed   cchapman5a@csmonitor.com                        0.0   
196    Subscribed  rspillett5b@posterous.com                        1.0   
197    Subscribed    wschoenrock5e@webmd.com                        1.0   
198    Subscribed     ggotthardsf5f@webs.com                        1.0   
199    Subscribed   ressberger5g@dedecms.com                        1.0   

     Interested in Soap Bars Sign-up Date       Date  Liquid Sales to Date  \
0                    