### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [3]:
data = pd.read_csv('data/coupons.csv')

In [4]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

3. Decide what to do about your missing data -- drop, replace, other...

In [6]:
# check number of missing lines in each column
data.isnull().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

> Since most of the data in the 'car' column is missing, I will drop it.

In [None]:
data = data.drop('car', axis=1)

4. What proportion of the total observations chose to accept the coupon? 



In [None]:
proportionAccepted = data['Y'].value_counts() / data.__len__()
print(proportionAccepted)

Y
1    0.568433
0    0.431567
Name: count, dtype: float64


> About 56.8% of the total observations chose to accept the coupon

5. Use a bar plot to visualize the `coupon` column.

In [21]:
import plotly.express as px
px.bar(data, x='coupon', labels={'coupon':'Coupon Type', 'count':'Count'}, title='Number of Coupons presented for each Category')

6. Use a histogram to visualize the temperature column.

In [42]:
px.histogram(data, x='temperature', labels={'temperature':'Temperature (in Farenheit)'}, title='Reported Temperature when Coupons were presented')

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [47]:
bardf = data[data['coupon']=='Bar']
bardf

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
13,Home,Alone,Sunny,55,6PM,Bar,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,1,0,1
17,Work,Alone,Sunny,55,7AM,Bar,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,1,0,1,0
24,No Urgent Place,Friend(s),Sunny,80,10AM,Bar,1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,0,1,1
35,Home,Alone,Sunny,55,6PM,Bar,1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12663,No Urgent Place,Friend(s),Sunny,80,10PM,Bar,1d,Male,26,Single,...,never,1~3,4~8,1~3,1,1,0,0,1,0
12664,No Urgent Place,Friend(s),Sunny,55,10PM,Bar,2h,Male,26,Single,...,never,1~3,4~8,1~3,1,1,0,0,1,0
12667,No Urgent Place,Alone,Rainy,55,10AM,Bar,1d,Male,26,Single,...,never,1~3,4~8,1~3,1,1,0,0,1,0
12670,No Urgent Place,Partner,Rainy,55,6PM,Bar,2h,Male,26,Single,...,never,1~3,4~8,1~3,1,1,0,0,1,0


2. What proportion of bar coupons were accepted?


In [50]:
proportionBarAccepted = bardf['Y'].value_counts() / bardf.__len__()
print(proportionBarAccepted)

Y
0    0.589985
1    0.410015
Name: count, dtype: float64


> About 41% of bar coupons were accepted

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [67]:
barThreeOrLess = bardf[bardf['Bar'].isin(['never', 'less1', '1~3'])]
barFourOrMore = bardf[bardf['Bar'].isin(['4~8', 'gt8'])]

In [68]:
proportionBar3OrLess = barThreeOrLess['Y'].value_counts() / barThreeOrLess.__len__()
print(proportionBar3OrLess)

Y
0    0.629382
1    0.370618
Name: count, dtype: float64


In [69]:
proportionBar4OrMore = barFourOrMore['Y'].value_counts() / barFourOrMore.__len__()
print(proportionBar4OrMore)

Y
1    0.768844
0    0.231156
Name: count, dtype: float64


> About 37.1% of the people who go to bars 3 or fewer times a month accepted the bar coupons, compared to the approx. 76.9% of people who go to bars 4 or more times accepting the bar coupons.

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [88]:
barMoreThanOnceOlder25 = bardf[(~bardf['age'].isin(['below21', '21'])) & (~bardf['Bar'].isin(['never', 'less1']))]

In [89]:
proportionMoreThanOnceOlder25 = barMoreThanOnceOlder25['Y'].value_counts() / barMoreThanOnceOlder25.__len__()
print(proportionMoreThanOnceOlder25)

Y
1    0.681818
0    0.318182
Name: count, dtype: float64


> About 68.2% of people who are above the age of 25 and go to bars more than once a month accepted the bar coupons, as compared to the 41% of bar coupons being accepted overall (as noted from step 2 of this exercise).

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [100]:
barMoreThanOnce = bardf[~bardf['Bar'].isin(['never', 'less1'])]
barAdultPassenger = bardf[~bardf['passanger'].isin(['Alone', 'Kid(s)'])]
barOccupationNotFFF = bardf[~bardf['occupation'].isin(['Farming Fishing & Forestry'])]

In [102]:
proportionBarMoreThanOnce = barMoreThanOnce['Y'].value_counts() / barMoreThanOnce.__len__()
print(proportionBarMoreThanOnce)

Y
1    0.677472
0    0.322528
Name: count, dtype: float64


In [103]:
proportionBarAdultPassenger = barAdultPassenger['Y'].value_counts() / barAdultPassenger.__len__()
print(proportionBarAdultPassenger)

Y
0    0.517185
1    0.482815
Name: count, dtype: float64


In [104]:
proportionBarOccupationNotFFF = barOccupationNotFFF['Y'].value_counts() / barOccupationNotFFF.__len__()
print(proportionBarOccupationNotFFF)

Y
0    0.590139
1    0.409861
Name: count, dtype: float64


> About 67.7% of people who go to bars more than once a month accepted the bar coupon  
> About 48.3% of people who had an adult passenger (not kid or alone) accepted the bar coupon  
> About 41.0% of people who had occupations outside of the Fishing Farming & Forestry industry accepted the bar coupon  

In [105]:
barMoreThanOncePassengerOccupations = bardf[(~bardf['Bar'].isin(['never', 'less1'])) & (~bardf['passanger'].isin(['Alone', 'Kid(s)'])) & (~bardf['occupation'].isin(['Farming Fishing & Forestry']))]

In [106]:
proprotionBarMoreThanOncePassengerOccupations = barMoreThanOncePassengerOccupations['Y'].value_counts() / barMoreThanOncePassengerOccupations.__len__()
print(proprotionBarMoreThanOncePassengerOccupations)

Y
1    0.707317
0    0.292683
Name: count, dtype: float64


> When combining all three conditions for frequency, passengers, and occupations, the bar coupon was accepted approx. 70.7% of the time!

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [213]:
barFinalComparison = bardf[((~bardf['Bar'].isin(['never', 'less1'])) & (~bardf['passanger'].isin(['Alone', 'Kid(s)'])) & (~bardf['maritalStatus'].isin(['Widowed']))) | ((~bardf['Bar'].isin(['never', 'less1'])) & (bardf['age'].isin(['below21','21','26']))) | ((bardf['RestaurantLessThan20'].isin(['4~8','gt8'])) & (bardf['income'].isin(['$25000 - $37499','$12500 - $24999','$37500 - $49999','Less than $12500'])))]

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [215]:
proportionBarFinalComparison = barFinalComparison['Y'].value_counts() / barFinalComparison.__len__()
print(proportionBarFinalComparison)

Y
1    0.568249
0    0.431751
Name: count, dtype: float64


> Combining all the conditions listed under step 6, we can see an overall acceptance rate of about 56.8% for bar coupons

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

> I will investigate the acceptance rates for Coffee House coupons

> Create a new dataframe for only the Coffee House coupons

In [112]:
coffeedf = data[data['coupon']=='Coffee House']

> checking baseline acceptance rate for coffee house coupons

In [121]:
proportionCoffee = coffeedf['Y'].value_counts() / coffeedf.__len__()
proportionCoffee

Y
0    0.500751
1    0.499249
Name: count, dtype: float64

> Overall, about 49.9% of people who were presented coupons to a Coffee House accepted it.

> Let's compare the baseline rate to those who are going to work, and the coffee shop is in the same direction as their workplace.

In [128]:
coffeeDestinationDirection = coffeedf[(coffeedf['destination'] == 'Work') & (coffeedf['direction_same'] == 1)]

In [129]:
proportionCoffeeDestinationDirection = coffeeDestinationDirection['Y'].value_counts() / coffeeDestinationDirection.__len__()
proportionCoffeeDestinationDirection

Y
1    0.576832
0    0.423168
Name: count, dtype: float64

> For people who are headed to work and the coffee shop is in the same direction, the acceptance rate is approx. 57.7% for the coupon.

> Let's take a look at how the temperature and weather conditions impacted the acceptance rates (warm and sunny vs. some form of precipitation or not warm)

In [153]:
coffeeSunnyWarm = coffeedf[(coffeedf['weather']=='Sunny') & (coffeedf['temperature']==80)]
coffeePrecipitationNotWarm = coffeedf[(coffeedf['weather']!='Sunny') | (coffeedf['temperature']!=80) ]

In [154]:
proportionCoffeeSunnyWarm = coffeeSunnyWarm['Y'].value_counts() / coffeeSunnyWarm.__len__()
proportionCoffeeSunnyWarm

Y
1    0.529779
0    0.470221
Name: count, dtype: float64

In [155]:
proportionCoffeePrecipitationNotWarm = coffeePrecipitationNotWarm['Y'].value_counts() / coffeePrecipitationNotWarm.__len__()
proportionCoffeePrecipitationNotWarm

Y
0    0.546708
1    0.453292
Name: count, dtype: float64

> While the temperature and weather seems to have a slight impact on the acceptance rate for coffee house coupons, the difference is not very significant from the baseline acceptance rate.

> Let's explore how the frequency that people visit Coffee houses affects the acceptance rates for coupons

In [157]:
coffeeThreeOrless = coffeedf[coffeedf['CoffeeHouse'].isin(['never', 'less1','1~3'])]
coffeeFourOrMore = coffeedf[coffeedf['CoffeeHouse'].isin(['4~8','gt8'])]

In [158]:
proportionCoffeeThreeOrLess = coffeeThreeOrless['Y'].value_counts() / coffeeThreeOrless.__len__()
proportionCoffeeThreeOrLess

Y
0    0.550591
1    0.449409
Name: count, dtype: float64

In [159]:
proportionCoffeeFourOrMore = coffeeFourOrMore['Y'].value_counts() / coffeeFourOrMore.__len__()
proportionCoffeeFourOrMore

Y
1    0.675
0    0.325
Name: count, dtype: float64

> People who visit coffee houses three times or less in a month accepted the coffee house coupons approx. 44.9% of the time, vs the 67.5% acceptance rate for the same coupons when the person visits coffee houses four or more times in a month!

> Lastly, let's visualize how the industry of a person's occupation affects the acceptance rate of the coffee house coupons

In [219]:
proportionCoffeeByOccupation = coffeedf[coffeedf['Y']==1].groupby('occupation').size() / coffeedf.groupby('occupation').size()

In [220]:
px.bar(proportionCoffeeByOccupation, labels={'value':'Coupon Acceptance Rate'}, title='Acceptance Rates for Coffee House Coupons by Occupation')