# *Starbucks Data Analysis*

## Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer.

This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

Someone using the app might make a purchase through the app without having received an offer or seen an offer.

### Importing Dataset

In [64]:
import pandas as pd
import numpy as np

import seaborn as sb
import matplotlib.pyplot as plt

%matplotlib inline

In [65]:
data = pd.read_csv('data/merged_data.csv')

In [66]:
data.sample(5)

Unnamed: 0,customer_id,event,test_day,offer_id,amount,reward,gender,age,became_member_on,income,difficulty,duration,offer_type,email,mobile,social,web
138050,1c2fbdb4370440b8bafc1a921285af06,offer completed,22,2906b810c7d4411798c6938adc9daaa5,0.0,2,O,55,2017-12-29,84000.0,10.0,7.0,discount,1.0,1.0,0.0,1.0
99664,0d21bafce46a417d9488c090367278e1,offer viewed,27,ae264e3637204a6fb9bb56bc8210ddfd,0.0,0,M,56,2017-07-13,72000.0,10.0,7.0,bogo,1.0,1.0,1.0,0.0
71273,540c7c9167ee422d9c9e8291dc62bd37,transaction,10,,0.65,0,M,23,2017-05-29,37000.0,,,,,,,
6760,7e527b19f70b4aeeb1e99ed78aca1598,offer viewed,7,3f207df678b143eea3cee63160fa8bed,0.0,0,M,71,2016-04-06,95000.0,0.0,4.0,informational,1.0,1.0,0.0,1.0
179967,0dab5a72d09e42ee957a574567a5ad3f,transaction,28,,12.28,0,M,60,2015-09-26,71000.0,,,,,,,


In [67]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272762 entries, 0 to 272761
Data columns (total 17 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   customer_id       272762 non-null  object 
 1   event             272762 non-null  object 
 2   test_day          272762 non-null  int64  
 3   offer_id          148805 non-null  object 
 4   amount            272762 non-null  float64
 5   reward            272762 non-null  int64  
 6   gender            272762 non-null  object 
 7   age               272762 non-null  int64  
 8   became_member_on  272762 non-null  object 
 9   income            272762 non-null  float64
 10  difficulty        148805 non-null  float64
 11  duration          148805 non-null  float64
 12  offer_type        148805 non-null  object 
 13  email             148805 non-null  float64
 14  mobile            148805 non-null  float64
 15  social            148805 non-null  float64
 16  web               14

#### Converting `became-member-on` to datetime format 

In [68]:
data['became_member_on'] = pd.to_datetime(data['became_member_on'])

In [69]:
data.sample(5)

Unnamed: 0,customer_id,event,test_day,offer_id,amount,reward,gender,age,became_member_on,income,difficulty,duration,offer_type,email,mobile,social,web
160011,716a4a753ef345beaf8d57e266b930bd,transaction,19,,6.35,0,F,76,2018-04-11,52000.0,,,,,,,
40777,133806412d2a4ebd813370359f357878,offer received,14,f19421c1d4aa40978ebb69ca19b0e20d,0.0,0,F,36,2017-09-08,52000.0,5.0,5.0,bogo,1.0,1.0,1.0,1.0
235699,5e44bc50a2b84b0a89d7246f5a85617e,offer received,7,4d5c57ea9a6940dd891ad53e9dbe8da0,0.0,0,M,43,2018-07-23,99000.0,10.0,5.0,bogo,1.0,1.0,1.0,1.0
24551,a95b19c70d724d968d7cd1a91b1e9033,transaction,17,,28.47,0,M,58,2016-05-14,117000.0,,,,,,,
202685,397363559b35488cb7f52f79b055a331,offer viewed,21,2298d6c36e964ae4a3e7e9706d1fb8c2,0.0,0,M,20,2018-05-19,34000.0,7.0,7.0,discount,1.0,1.0,1.0,1.0


In [70]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272762 entries, 0 to 272761
Data columns (total 17 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   customer_id       272762 non-null  object        
 1   event             272762 non-null  object        
 2   test_day          272762 non-null  int64         
 3   offer_id          148805 non-null  object        
 4   amount            272762 non-null  float64       
 5   reward            272762 non-null  int64         
 6   gender            272762 non-null  object        
 7   age               272762 non-null  int64         
 8   became_member_on  272762 non-null  datetime64[ns]
 9   income            272762 non-null  float64       
 10  difficulty        148805 non-null  float64       
 11  duration          148805 non-null  float64       
 12  offer_type        148805 non-null  object        
 13  email             148805 non-null  float64       
 14  mobi

In [71]:
data.describe()

Unnamed: 0,test_day,amount,reward,age,income,difficulty,duration,email,mobile,social,web
count,272762.0,272762.0,272762.0,272762.0,272762.0,148805.0,148805.0,148805.0,148805.0,148805.0,148805.0
mean,15.011812,6.360646,0.588575,53.840696,64337.000755,7.890561,6.625207,1.0,0.91716,0.658311,0.806747
std,8.331806,22.509207,1.889452,17.551337,21243.762941,5.041335,2.133035,0.0,0.275641,0.474277,0.394851
min,0.0,0.0,0.0,18.0,30000.0,0.0,3.0,1.0,0.0,0.0,0.0
25%,7.0,0.0,0.0,41.0,48000.0,5.0,5.0,1.0,1.0,0.0,1.0
50%,17.0,0.0,0.0,55.0,62000.0,10.0,7.0,1.0,1.0,1.0,1.0
75%,22.0,9.14,0.0,66.0,78000.0,10.0,7.0,1.0,1.0,1.0,1.0
max,29.0,1062.28,10.0,101.0,120000.0,20.0,10.0,1.0,1.0,1.0,1.0


#### Modifying Data

In [72]:
data.sample(5)

Unnamed: 0,customer_id,event,test_day,offer_id,amount,reward,gender,age,became_member_on,income,difficulty,duration,offer_type,email,mobile,social,web
84937,671a142bc07e47389b920cb0c37793bd,offer received,7,4d5c57ea9a6940dd891ad53e9dbe8da0,0.0,0,M,29,2018-01-25,67000.0,10.0,5.0,bogo,1.0,1.0,1.0,1.0
255363,05bedc6bbcc64a41b1745f6efe00776d,offer viewed,18,3f207df678b143eea3cee63160fa8bed,0.0,0,F,70,2015-07-29,105000.0,0.0,4.0,informational,1.0,1.0,0.0,1.0
82213,937adf623da24ea1816eba6b4775f689,offer viewed,3,5a8bc65990b245e5a138643cd4eb9837,0.0,0,F,52,2018-06-21,55000.0,0.0,3.0,informational,1.0,1.0,1.0,0.0
245160,f0954d1f55444d6aba231d732be8ebdc,transaction,18,,18.53,0,F,38,2017-11-16,58000.0,,,,,,,
177559,0c3e283397d74cd9a3f535c7dd1d188c,offer received,21,5a8bc65990b245e5a138643cd4eb9837,0.0,0,F,51,2017-04-21,113000.0,0.0,3.0,informational,1.0,1.0,1.0,0.0


### *Seperating event type `transaction` and `offers`.*

In [73]:
transaction_data = data.query('event == "transaction"')
# transaction data will have columns with only null values
# removing the rows with null values
transaction_data = transaction_data.dropna(axis = 1)

offer_data = data.query('event != "transaction"')

In [74]:
transaction_data.sample(5)

Unnamed: 0,customer_id,event,test_day,amount,reward,gender,age,became_member_on,income
151966,adca596a441a4d68b76a57d19f4041ab,transaction,14,29.63,0,F,83,2016-07-19,86000.0
32294,78f6ac1eb6c240368c54284c682e4224,transaction,1,10.47,0,F,77,2017-11-03,64000.0
19970,21e1f3e157e74c028418d85d4365a5e6,transaction,15,34.64,0,M,53,2017-08-04,92000.0
136840,d634cc586de4494d8a7b07a54df7c91f,transaction,24,10.53,0,M,59,2015-12-13,60000.0
28405,df227598be2f4937b3c60036010042a9,transaction,14,18.01,0,M,47,2017-06-17,84000.0


In [75]:
offer_data.sample(5)

Unnamed: 0,customer_id,event,test_day,offer_id,amount,reward,gender,age,became_member_on,income,difficulty,duration,offer_type,email,mobile,social,web
68289,b7e641061b4345f1adfffffad7e7afdd,offer viewed,17,2298d6c36e964ae4a3e7e9706d1fb8c2,0.0,0,M,20,2016-03-03,31000.0,7.0,7.0,discount,1.0,1.0,1.0,1.0
219758,637b71c75f1444e38e2e361746590e5a,offer completed,19,2298d6c36e964ae4a3e7e9706d1fb8c2,0.0,3,M,23,2014-09-10,41000.0,7.0,7.0,discount,1.0,1.0,1.0,1.0
252278,8f97634e6de846dcad837f2080273ee5,offer viewed,8,2298d6c36e964ae4a3e7e9706d1fb8c2,0.0,0,M,61,2017-11-12,54000.0,7.0,7.0,discount,1.0,1.0,1.0,1.0
56817,ff9f73ead16a4f9b9e1a53a27280af92,offer completed,1,fafdcd668e3743c1bb461111dcafc2a4,0.0,2,F,43,2015-07-31,67000.0,10.0,10.0,discount,1.0,1.0,1.0,1.0
169881,04e0eff0b4704fc291bbe1275f01fb7a,offer completed,24,2906b810c7d4411798c6938adc9daaa5,0.0,2,M,54,2017-12-10,65000.0,10.0,7.0,discount,1.0,1.0,0.0,1.0


In [76]:
transaction_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 123957 entries, 2 to 272755
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   customer_id       123957 non-null  object        
 1   event             123957 non-null  object        
 2   test_day          123957 non-null  int64         
 3   amount            123957 non-null  float64       
 4   reward            123957 non-null  int64         
 5   gender            123957 non-null  object        
 6   age               123957 non-null  int64         
 7   became_member_on  123957 non-null  datetime64[ns]
 8   income            123957 non-null  float64       
dtypes: datetime64[ns](1), float64(2), int64(3), object(3)
memory usage: 9.5+ MB


In [77]:
offer_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 148805 entries, 0 to 272761
Data columns (total 17 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   customer_id       148805 non-null  object        
 1   event             148805 non-null  object        
 2   test_day          148805 non-null  int64         
 3   offer_id          148805 non-null  object        
 4   amount            148805 non-null  float64       
 5   reward            148805 non-null  int64         
 6   gender            148805 non-null  object        
 7   age               148805 non-null  int64         
 8   became_member_on  148805 non-null  datetime64[ns]
 9   income            148805 non-null  float64       
 10  difficulty        148805 non-null  float64       
 11  duration          148805 non-null  float64       
 12  offer_type        148805 non-null  object        
 13  email             148805 non-null  float64       
 14  mobi

## Exploratory Data Analysis

### *Transaction Data*