### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [326]:
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import numpy as np

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [327]:
data = pd.read_csv('data/coupons.csv')

In [328]:
# Set options to display all the columns instead of summarizing using ...
pd.set_option('display.max_columns', None)
data.head(2)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,0


In [329]:
# Print column names
data.columns

Index(['destination', 'passanger', 'weather', 'temperature', 'time', 'coupon',
       'expiration', 'gender', 'age', 'maritalStatus', 'has_children',
       'education', 'occupation', 'income', 'car', 'Bar', 'CoffeeHouse',
       'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50',
       'toCoupon_GEQ5min', 'toCoupon_GEQ15min', 'toCoupon_GEQ25min',
       'direction_same', 'direction_opp', 'Y'],
      dtype='object')

In [330]:
data.describe()

Unnamed: 0,temperature,has_children,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
count,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0
mean,63.301798,0.414144,1.0,0.561495,0.119126,0.214759,0.785241,0.568433
std,19.154486,0.492593,0.0,0.496224,0.32395,0.410671,0.410671,0.495314
min,30.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,55.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
50%,80.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0
75%,80.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0
max,80.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [331]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

In [332]:
# Print unique values appearing in every column
data.agg(['unique']).T

Unnamed: 0,unique
destination,"[No Urgent Place, Home, Work]"
passanger,"[Alone, Friend(s), Kid(s), Partner]"
weather,"[Sunny, Rainy, Snowy]"
temperature,"[55, 80, 30]"
time,"[2PM, 10AM, 6PM, 7AM, 10PM]"
coupon,"[Restaurant(<20), Coffee House, Carry out & Ta..."
expiration,"[1d, 2h]"
gender,"[Female, Male]"
age,"[21, 46, 26, 31, 41, 50plus, 36, below21]"
maritalStatus,"[Unmarried partner, Single, Married partner, D..."


2. Investigate the dataset for missing or problematic data.

In [334]:
# drop duplicate rows
data = data.drop_duplicates()
data.duplicated().sum()

0

In [335]:
# Look at number of Nan values in each columns
data.isna().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12502
Bar                       107
CoffeeHouse               217
CarryAway                 150
RestaurantLessThan20      129
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

3. Decide what to do about your missing data -- drop, replace, other...

The car column has 99% data missing (is NAN). Might as well drop it

In [336]:
# drop 'car' column
data = data.drop(['car'], axis=1)
data.head(2)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,,4~8,1~3,1,0,0,0,1,0


The rest of the NAN values could be converted into the string 'never'

In [337]:
# Handle Nan in the rest of the columns by converting it to 'never'
data = data.fillna( { 'Bar': 'never', 
                      'CoffeeHouse': 'never', 
                      'CarryAway': 'never', 
                      'RestaurantLessThan20': 'never', 
                      'Restaurant20To50': 'never' } )
# Ensure there are no more Nan entries
data.isna().sum()

destination             0
passanger               0
weather                 0
temperature             0
time                    0
coupon                  0
expiration              0
gender                  0
age                     0
maritalStatus           0
has_children            0
education               0
occupation              0
income                  0
Bar                     0
CoffeeHouse             0
CarryAway               0
RestaurantLessThan20    0
Restaurant20To50        0
toCoupon_GEQ5min        0
toCoupon_GEQ15min       0
toCoupon_GEQ25min       0
direction_same          0
direction_opp           0
Y                       0
dtype: int64

4. What proportion of the total observations chose to accept the coupon? 



In [338]:
# proportion = number of rows with Y=1 / total number of rows
proportionAccepted = data.groupby('Y').count().iloc[1, 0] / data.shape[0]
print( f'Proportion of total observations that chose to accept coupon = {proportionAccepted}')

Proportion of total observations that chose to accept coupon = 0.5675654242664552


5. Use a bar plot to visualize the `coupon` column.

In [375]:
fig = px.bar(data, x='coupon', color='Y')

fig.update_layout(plot_bgcolor='rgba(0, 0, 0, 0)',
                  paper_bgcolor='rgba(0, 0, 0, 0)',)


fig.show()

6. Use a histogram to visualize the temperature column.

In [340]:
px.histogram(data, x='temperature', color='coupon')

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [341]:
barData = data[ data[ 'coupon'] == 'Bar']
barData.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,1,0,0,1,0
13,Home,Alone,Sunny,55,6PM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,0,0,1,0,1
17,Work,Alone,Sunny,55,7AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,1,1,0,1,0
24,No Urgent Place,Friend(s),Sunny,80,10AM,Bar,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,0,1,1
35,Home,Alone,Sunny,55,6PM,Bar,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,1,0,1


2. What proportion of bar coupons were accepted?


In [342]:
# proportion = number of rows in barData with Y=1 / total number of rows in barData
proportionBarCouponAccepted = barData.groupby('Y').count().iloc[1,0] / barData.shape[0]
print( f'Proportion of bar coupons accepted = {proportionBarCouponAccepted}')

Proportion of bar coupons accepted = 0.4099502487562189


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [390]:
# List of values that represent <=3 bar visits. Remember NaN has been converted to 'never' 
barThreeOrFewer = [ 'never', 'less1', '1~3' ]
# Number of acceptance among people who have visited bar <=3 times
threeOrFewerNumAccepted = barData.query('Bar in @barThreeOrFewer').groupby('Y').count().iloc[1,1]
threeOrFewerNumNotAccepted = barData.query('Bar in @barThreeOrFewer').groupby('Y').count().iloc[0,0]
# Total number of rows
total1 = threeOrFewerNumNotAccepted + threeOrFewerNumAccepted
# Acceptance rate = Number of acceptance among people who have visited bar <=3 times / total
threeOrFewerAcceptance = threeOrFewerNumAccepted / total1

# Number of acceptance among people who have visited bar >3 times
gtThreeNumAccepted = barData.query('Bar not in @barThreeOrFewer').groupby('Y').count().iloc[1,1]
gtThreeNumNotAccepted = barData.query('Bar not in @barThreeOrFewer').groupby('Y').count().iloc[0,0]
total2 = gtThreeNumAccepted + gtThreeNumNotAccepted
# Acceptance rate = Number of acceptance among people who have visited bar >3 times / total
gtThreeAcceptance = gtThreeNumAccepted / total2

print( f'{threeOrFewerNumAccepted} out of a total of {total1} : Acceptance rate from people who went to bar <=3 times = {threeOrFewerAcceptance}' )
print( f'{gtThreeNumAccepted} out of a total of {total2} : Acceptance rate from people who went to bar >3 times = {gtThreeAcceptance}' )

671 out of a total of 1811 : Acceptance rate from people who went to bar <=3 times = 0.37051352843732743
153 out of a total of 199 : Acceptance rate from people who went to bar >3 times = 0.7688442211055276


4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [393]:
# List representing value over the age of 25
overAge25 = ['46', '26', '31', '41', '50plus', '36']
# List representing value with > 1 bar visit per month
gtOnePerMonthBarVisit = ['1~3', 'gt8', '4~8']

# Number accepting coupon who fall into both above condition
tmp1 = barData.loc[ barData['age'].isin(overAge25) & 
                    barData['Bar'].isin(gtOnePerMonthBarVisit) ]\
              .groupby('Y').count()
bothCondNumAccepted = tmp1.iloc[1,1]
bothCondNumNotAccepted = tmp1.iloc[0,0]
# Total number of rows
total1 = bothCondNumAccepted + bothCondNumNotAccepted

# Number accepting coupon who do not fall into above conditions
tmp2 = barData.query('age not in @overAge25 | Bar not in @gtOnePerMonthBarVisit')\
              .groupby('Y').count()
notBothCondNumAccepted = tmp2.iloc[1,1]
notBothCondNumNotAccepted = tmp2.iloc[0,0]
total2 = notBothCondNumAccepted + notBothCondNumNotAccepted

bothCondRate = bothCondNumAccepted / total1
notBothCondRate = notBothCondNumAccepted / total2
print( f'{bothCondNumAccepted} out of a total of {total1} : Acceptance rate from people >25 and >1 bar visit per month = {bothCondRate}' )
print( f'{notBothCondNumAccepted} out of a total of {total2} : Acceptance rate from all others = {notBothCondRate}' )

292 out of a total of 420 : Acceptance rate from people >25 and >1 bar visit per month = 0.6952380952380952
532 out of a total of 1590 : Acceptance rate from all others = 0.33459119496855344


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [394]:
# List representing value with > 1 bar visit per month
gtOnePerMonthBarVisit = ['1~3', 'gt8', '4~8']

# List representing no-kid passenger
noKid = ['Friend(s)', 'Partner']

# List representing farming/fishing/forestry
occupationIsFarming = ['Farming Fishing & Forestry']

tmp = barData.query( 'Bar in @gtOnePerMonthBarVisit & passanger in @noKid & occupation not in @occupationIsFarming')\
             .groupby('Y').count()
numAccepted = tmp.iloc[1,1]
numNotAccepted = tmp.iloc[0,0]
total = numAccepted + numNotAccepted    
acceptanceRate = numAccepted / total
print( f'{numAccepted} out of {total}: Acceptance rate among drivers who go to bars >1 per month and have no kid passengers and who dont do farming/fishing/forestry = {acceptanceRate}')

140 out of 195: Acceptance rate among drivers who go to bars >1 per month and have no kid passengers and who dont do farming/fishing/forestry = 0.717948717948718


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [396]:
# List representing value with > 1 bar visit per month
gtOnePerMonthBarVisit = ['1~3', 'gt8', '4~8']
# List representing no-kid passenger
noKid = ['Friend(s)', 'Partner']
# maritalStatus is Widowed
maritalStatusWidowed = ['Widowed']
# age under 30
underAge30 = ['21', '26', 'below21']
# RestaurantLessThan20 has >4 times
cheapRestGt4 = ['4~8', 'gt8']
# income is less than 50K
incomeLt50K = ['$37500 - $49999']

tmp1 = barData.query( 'Bar in @gtOnePerMonthBarVisit & passanger in @noKid & maritalStatus not in @maritalStatusWidowed')\
               .groupby('Y').count()
tmp2 = barData.query( 'Bar in @gtOnePerMonthBarVisit & age in @underAge30')\
               .groupby('Y').count()
tmp3 = barData.query( 'RestaurantLessThan20 in @cheapRestGt4 & income in @incomeLt50K')\
               .groupby('Y').count()

rate1 = tmp1.iloc[1,1] / (tmp1.iloc[1,1] + tmp1.iloc[0,0])
rate2 = tmp2.iloc[1,1] / (tmp2.iloc[1,1] + tmp2.iloc[0,0])
rate3 = tmp3.iloc[1,1] / (tmp3.iloc[1,1] + tmp3.iloc[0,0])

print( rate1, rate2, rate3 )



0.717948717948718 0.7217391304347827 0.4897959183673469


7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

* 40% of poeple who were offered bar coupons took it.
* Accptance rate among people who went to the bar 3 or fewer times was at 37%, while among people who went to the bar more often only accepted at the rate of 76.8%.
  This seems to indicate that people who go to bar more often like to take deals offered by the coupon.
* For people who go to bars more than 1 per month are more likely to accept the coupon if they are older than 25 years
* Similarly, for people who go to bars more than 1 per month and who are not accompanied by a kid AND if they dont work in farming/fishing/forestry are highly likely to accept the coupon. Perhaps the framing/fishing/forestry folks are out there working or doinig outdoorsy activities, rather than spending their time drinking in a bar.
* The fact that kids are not present or the person is young (<30) does increase chances of accepting coupon.
* Income does appear to make a difference. The fact that the individual is cost conscious and makes less income makes them less likely to accept the coupon.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

**Investigating the Coffee House Coupons**

1. Create a new `DataFrame` that contains just the Coffee House coupons.

In [357]:
coffeeHouseData = data[ data[ 'coupon'] == 'Coffee House']
coffeeHouseData.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,0,0,0,1,0
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,1,0,0,1,0
12,No Urgent Place,Kid(s),Sunny,55,6PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,1,0,0,1,1
15,Home,Alone,Sunny,80,6PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,never,4~8,1~3,1,0,0,0,1,0


2. Find out what proportion of Coffee House coupons were accepted

In [360]:
# proportion = number of rows in coffeHouseData with Y=1 / total number of rows in coffeeHouseData
proportionCHCouponAccepted = coffeeHouseData.groupby('Y').count().iloc[1,0] / coffeeHouseData.shape[0]
print( f'Proportion of coffee house coupons accepted = {proportionCHCouponAccepted}')

Proportion of coffee house coupons accepted = 0.4986212083228879


3. Lets try to find out what kind of user is more likely to accept the Coffee House coupon
* If we look at the bar chart colored by 'Y', a pattern should emerge

In [397]:
fig = px.histogram( coffeeHouseData, x='CoffeeHouse', color='Y')
print(coffeeHouseData.shape)
fig.show()

(3989, 25)


In [407]:
# List representing value <3 Coffee House visit per month
ltThreePerMonthCoffeeHouseVisit = ['less1', '1~3']
ltThreePerMonthCoffeeHouseVisitWithNever = ['never', 'less1', '1~3']
gtThreePerMonthCoffeeHouseVisit = ['gt8', '4~8']


tmp1 = coffeeHouseData.query( 'CoffeeHouse in @ltThreePerMonthCoffeeHouseVisit')\
               .groupby('Y').count()
total1 = (tmp1.iloc[1,1] + tmp1.iloc[0,0])
rate1 = tmp1.iloc[1,1] / total1

tmp2 = barData.query( 'CoffeeHouse in @ltThreePerMonthCoffeeHouseVisitWithNever')\
               .groupby('Y').count()
total2 = (tmp2.iloc[1,1] + tmp2.iloc[0,0])
rate2 = tmp2.iloc[1,1] / total2

tmp3 = barData.query( 'CoffeeHouse in @gtThreePerMonthCoffeeHouseVisit')\
               .groupby('Y').count()
total3 = (tmp3.iloc[1,1] + tmp3.iloc[0,0])
rate3 = tmp3.iloc[1,1] / total3

print( f'People who visit <3 months: {tmp1.iloc[1,1]} out of {total1}: rate = {rate1}')
print( f'People who visit <3 months (including never): {tmp2.iloc[1,1]} out of {total2}: rate = {rate2}')
print( f'People who visit >3 months: {tmp3.iloc[1,1]} out of {total3}: rate = {rate3}')

People who visit <3 months: 1187 out of 2110: rate = 0.5625592417061611
People who visit <3 months (including never): 611 out of 1545: rate = 0.39546925566343044
People who visit >3 months: 213 out of 465: rate = 0.45806451612903226


Sure enough, it appears that people who visit coffee houses <3 times in a month have a high acceptance rate of 56%
VS people who visit coffee houses more often (>3 times a month) have slightly lower acceptance rate of 45%
But the rate declines if you include the people who have never gone to a Coffee House (39.5%)

4. Lets try to drill down some more
* How does age and destination affect acceptance of the coupon.
* Lets start with a filtered dataset where we have <3 visits to coffee house (but have visited atleast once)
* Lets see the histogram based on age and color it based on destination

In [425]:
lt3Data = coffeeHouseData[coffeeHouseData['Y']==1].query( 'CoffeeHouse in @ltThreePerMonthCoffeeHouseVisit')
px.histogram(lt3Data, 'age', color='destination' )

Very clearly, the following patterns emerge. People who fit into all of the following profiles are very likely to accept a coffee house coupon
* People who have visited a coffee house between 1-3 times in the last month AND
* People who are <30 years of age AND
* People who are not headed to an urgent destination (not going to work or returning home)