### Required Assignment 5.1: Will the Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaurant near where you are driving. Would you accept that coupon and take a short detour to the restaurant? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaurant? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [635]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [702]:
data = pd.read_csv('coupons.csv')

In [703]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [330]:
# Count nulls AND create dataframe
counts = data.isnull().sum().sort_values()
missing_df = pd.DataFrame({'Category': counts.index, 'Missing Count': counts.values})

# Create bar chart
fig = px.bar(
    missing_df, 
    x='Category', 
    y='Missing Count', 
    text='Missing Count', 
    title="Missing Values or NULL in Each Category"
)

fig.update_traces(textposition='outside')
fig.update_layout(
    xaxis_tickangle=-45,  # Rotate x-axis labels
    yaxis_title="Missing Values Count",
    xaxis_title="Categories",
    height=600, width=1000
)

fig.show()


In [39]:
def find_bad_data(df):
    # Convert each column to string and check if it contains a string
    bad_data_mask = df.map(lambda x: isinstance(x, str))
    
    # Select rows where at least one column has bad data
    bad_data_indices = df[bad_data_mask.any(axis=1)]
    
    return bad_data_indices

# Example usage
bad_data_rows = find_bad_data(data)


In [84]:
# Count occurrences of bad data in 'checking_status'
bad_data_counts = bad_data_rows['age'].astype(str).value_counts().reset_index()

In [85]:
#### Check Other problematic data : All Strings #####

bad_data_counts.columns = ['Bad Data Type', 'Count']

# Sort values before plotting
bad_data_counts = bad_data_counts.sort_values(by='Count')

# Create a bar chart in Plotly
fig = px.bar(
    bad_data_counts, 
    x='Bad Data Type', 
    y='Count', 
    text='Count', 
    title="Data in 'Age' Column",
    labels={'Bad Data Type': 'All Data Values', 'Count': 'Count'},
    color='Bad Data Type'  


fig.update_traces(textposition='outside')

fig.update_layout(width=800, height=500, xaxis_tickangle=45)

In [62]:
data['age'].value_counts()

age
21         2653
26         2559
31         2039
50plus     1788
36         1319
41         1093
46          686
below21     547
Name: count, dtype: int64

In [47]:
bad_data_counts = bad_data_rows['CoffeeHouse'].astype(str).value_counts().reset_index()

In [48]:

bad_data_counts.columns = ['Bad Data Type', 'Count']

# Sort values before plotting
bad_data_counts = bad_data_counts.sort_values(by='Count')

# Create a bar chart in Plotly
fig = px.bar(
    bad_data_counts, 
    x='Bad Data Type', 
    y='Count', 
    text='Count', 
    title="Data in 'Coffee House' Column",
    labels={'Bad Data Type': 'All Data Values', 'Count': 'Count'},
    color='Bad Data Type' 
)

# Format the text labels on bars
fig.update_traces(textposition='outside')

# Set figure size and rotate x-axis labels
fig.update_layout(width=800, height=500, xaxis_tickangle=45)

In [328]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

In [94]:
data['destination'].value_counts() # NO Issues

destination
No Urgent Place    6283
Home               3237
Work               3164
Name: count, dtype: int64

In [95]:
data['passanger'].value_counts() # minor issues (s)

passanger
Alone        7305
Friend(s)    3298
Partner      1075
Kid(s)       1006
Name: count, dtype: int64

In [96]:
data['weather'].value_counts() # NO Issues

weather
Sunny    10069
Snowy     1405
Rainy     1210
Name: count, dtype: int64

In [97]:
data['temperature'].value_counts() # NO Issues

temperature
80    6528
55    3840
30    2316
Name: count, dtype: int64

In [98]:
data['time'].value_counts() # NO Issues

time
6PM     3230
7AM     3164
10AM    2275
2PM     2009
10PM    2006
Name: count, dtype: int64

In [99]:
data['coupon'].value_counts() # Yes Minor Issues - Restaurant(20-50), Restaurant(<20)

coupon
Coffee House             3996
Restaurant(<20)          2786
Carry out & Take away    2393
Bar                      2017
Restaurant(20-50)        1492
Name: count, dtype: int64

In [100]:
data['expiration'].value_counts() # NO Issues

expiration
1d    7091
2h    5593
Name: count, dtype: int64

In [101]:
data['gender'].value_counts() # NO Issues

gender
Female    6511
Male      6173
Name: count, dtype: int64

In [83]:
data['age'].value_counts() # NO Issues

age
21         2653
26         2559
31         2039
50plus     1788
36         1319
41         1093
46          686
below21     547
Name: count, dtype: int64

In [103]:
data['maritalStatus'].value_counts() # No Issues

maritalStatus
Married partner      5100
Single               4752
Unmarried partner    2186
Divorced              516
Widowed               130
Name: count, dtype: int64

In [104]:
data['has_children'].value_counts() # No Issues

has_children
0    7431
1    5253
Name: count, dtype: int64

In [105]:
data['education'].value_counts() # No Issues

education
Some college - no degree                  4351
Bachelors degree                          4335
Graduate degree (Masters or Doctorate)    1852
Associates degree                         1153
High School Graduate                       905
Some High School                            88
Name: count, dtype: int64

In [106]:
data['occupation'].value_counts() # YES Minor Issues  Education&Training&Library occupation

occupation
Unemployed                                   1870
Student                                      1584
Computer & Mathematical                      1408
Sales & Related                              1093
Education&Training&Library                    943
Management                                    838
Office & Administrative Support               639
Arts Design Entertainment Sports & Media      629
Business & Financial                          544
Retired                                       495
Food Preparation & Serving Related            298
Healthcare Practitioners & Technical          244
Healthcare Support                            242
Community & Social Services                   241
Legal                                         219
Transportation & Material Moving              218
Architecture & Engineering                    175
Personal Care & Service                       175
Protective Service                            175
Life Physical Social Science           

In [70]:
data['income'].value_counts() # YES ISSUES

income
$25000 - $37499     2013
$12500 - $24999     1831
$37500 - $49999     1805
$100000 or More     1736
$50000 - $62499     1659
Less than $12500    1042
$87500 - $99999      895
$75000 - $87499      857
$62500 - $74999      846
Name: count, dtype: int64

In [109]:
data['car'].value_counts() # Some Issues ...Yes Car that is too old to install Onstar :D 

car
Scooter and motorcycle                      22
Mazda5                                      22
do not drive                                22
crossover                                   21
Car that is too old to install Onstar :D    21
Name: count, dtype: int64

In [110]:
data['car'].isnull().sum()

12576

In [333]:
data['Bar'].value_counts() # NO ISSUES 

Bar
never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: count, dtype: int64

In [111]:
data['CoffeeHouse'].value_counts()   # NO ISSUES

CoffeeHouse
less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: count, dtype: int64

In [112]:
data['CarryAway'].value_counts()   # NO ISSUES

CarryAway
1~3      4672
4~8      4258
less1    1856
gt8      1594
never     153
Name: count, dtype: int64

In [76]:
data['RestaurantLessThan20'].value_counts() # NO ISSUES

RestaurantLessThan20
1~3      5376
4~8      3580
less1    2093
gt8      1285
never     220
Name: count, dtype: int64

In [113]:
data['Restaurant20To50'].value_counts() # NO ISSUES

Restaurant20To50
less1    6077
1~3      3290
never    2136
4~8       728
gt8       264
Name: count, dtype: int64

In [77]:
data['toCoupon_GEQ5min'].value_counts() # NO ISSUE

toCoupon_GEQ5min
1    12684
Name: count, dtype: int64

In [87]:
data['toCoupon_GEQ15min'].value_counts() # NO ISSUE

toCoupon_GEQ15min
1    7122
0    5562
Name: count, dtype: int64

In [88]:
data['toCoupon_GEQ25min'].value_counts()  # NO ISSUE

toCoupon_GEQ25min
0    11173
1     1511
Name: count, dtype: int64

In [89]:
data['direction_same'].value_counts() # NO ISSUE

direction_same
0    9960
1    2724
Name: count, dtype: int64

In [90]:
data['direction_opp'].value_counts() # NO ISSUE

direction_opp
1    9960
0    2724
Name: count, dtype: int64

In [82]:
data['Y'].value_counts() # NO ISSUE

Y
1    7210
0    5474
Name: count, dtype: int64

In [None]:
#********************* END *************************

3. Decide what to do about your missing data -- drop, replace, other...

In [None]:
########## Decide what to do about your missing data -- drop, replace, other...#########

In [143]:
data['car'] = data['car'].fillna('No Data Available')
data['car'].head(5)

0    No Data Available
1    No Data Available
2    No Data Available
3    No Data Available
4    No Data Available
Name: car, dtype: object

In [125]:
# Count missing values
counts = data.isnull().sum().sort_values()

# Create a DataFrame for Plotly
missing_df = pd.DataFrame({'Category': counts.index, 'Missing Count': counts.values})

# Create bar chart using Plotly
fig = px.bar(
    missing_df, 
    x='Category', 
    y='Missing Count', 
    text='Missing Count',  # Add total count labels on each bar
    title="Missing Values in Each Category"
)

# Update text position and style
fig.update_traces(textposition='outside')

# Update layout for better readability
fig.update_layout(
    xaxis_tickangle=-45,  # Rotate x-axis labels
    yaxis_title="Missing Values Count",
    xaxis_title="Categories",
    height=600, width=1000
)

# Show plot
fig.show()

# Save plot as an HTML file 
fig.write_html("missing_plot.html")


In [129]:
data['Bar'] = data['Bar'].fillna('No Data Available')
data['Bar'].head(5)

0    never
1    never
2    never
3    never
4    never
Name: Bar, dtype: object

In [134]:
data['RestaurantLessThan20'] = data['RestaurantLessThan20'].fillna('No Data Available')
data['RestaurantLessThan20'].head(5)

0    4~8
1    4~8
2    4~8
3    4~8
4    4~8
Name: RestaurantLessThan20, dtype: object

In [137]:
data['CarryAway'] = data['CarryAway'].fillna('No Data Available')
data['CarryAway'].head(5)

0    No Data Available
1    No Data Available
2    No Data Available
3    No Data Available
4    No Data Available
Name: CarryAway, dtype: object

In [139]:
data['Restaurant20To50'] = data['Restaurant20To50'].fillna('No Data Available')
data['Restaurant20To50'].head(5)

0    1~3
1    1~3
2    1~3
3    1~3
4    1~3
Name: Restaurant20To50, dtype: object

In [140]:
data['CoffeeHouse'] = data['CoffeeHouse'].fillna('No Data Available')
data['CoffeeHouse'].head(5)

0    never
1    never
2    never
3    never
4    never
Name: CoffeeHouse, dtype: object

In [141]:
# Count missing values
counts = data.isnull().sum().sort_values()

# Create a DataFrame for Plotly
missing_df = pd.DataFrame({'Category': counts.index, 'Missing Count': counts.values})

# Create bar chart using Plotly
fig = px.bar(
    missing_df, 
    x='Category', 
    y='Missing Count', 
    text='Missing Count',  # Add total count labels on each bar
    title="Missing Values in Each Category"
)

# Update text position and style
fig.update_traces(textposition='outside')

# Update layout for better readability
fig.update_layout(
    xaxis_tickangle=-45,  # Rotate x-axis labels
    yaxis_title="Missing Values Count",
    xaxis_title="Categories",
    height=600, width=1000
)

# Show plot
fig.show()

# Save plot as an HTML file 
fig.write_html("missing_plot.html")

In [131]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   12684 non-null  object
 15  Bar                   12684 non-null

In [332]:
data.head(10)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
5,No Urgent Place,Friend(s),Sunny,80,6PM,Restaurant(<20),2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
6,No Urgent Place,Friend(s),Sunny,55,2PM,Carry out & Take away,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
7,No Urgent Place,Kid(s),Sunny,80,10AM,Restaurant(<20),2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
8,No Urgent Place,Kid(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


4. What proportion of the total observations chose to accept the coupon?



In [144]:
data['coupon'].value_counts() # Restaurant(20-50), Restaurant(<20) YES ISSUES

coupon
Coffee House             3996
Restaurant(<20)          2786
Carry out & Take away    2393
Bar                      2017
Restaurant(20-50)        1492
Name: count, dtype: int64

In [291]:
df = data[['coupon', 'Y']]
df

Unnamed: 0,coupon,Y
0,Restaurant(<20),1
1,Coffee House,0
2,Carry out & Take away,1
3,Coffee House,0
4,Coffee House,0
...,...,...
12679,Carry out & Take away,1
12680,Carry out & Take away,1
12681,Coffee House,0
12682,Bar,0


In [182]:
totalCoupon = df['coupon'].value_counts().sum()
totalCoupon

12684

In [301]:
accept_df = df.query("Y==1")[['coupon']].reset_index()
accept_df [['coupon']]

Unnamed: 0,coupon
0,Restaurant(<20)
1,Carry out & Take away
2,Restaurant(<20)
3,Carry out & Take away
4,Restaurant(<20)
...,...
7205,Restaurant(<20)
7206,Restaurant(20-50)
7207,Restaurant(<20)
7208,Carry out & Take away


In [302]:
######## Ratio : total observations chose to accept the coupon ##############
ratio_accept = (accept_df["coupon"].value_counts().sum())/totalCoupon
ratio_accept

0.5684326710816777

5. Use a bar plot to visualize the `coupon` column.

In [239]:
import plotly.express as px

# Count occurrences of each coupon type
coupon_counts = df['coupon'].value_counts().reset_index()
coupon_counts.columns = ['coupon', 'count']

# Create bar plot
fig = px.bar(coupon_counts, x='coupon', y='count', 
             title='Coupon Usage Count', 
             labels={'coupon': 'Coupon Type', 'count': 'Count'}, 
             text='count')

# Show figure
fig.show()



6. Use a histogram to visualize the temperature column.

In [320]:
# Histogram
fig = px.histogram(data, x='temperature', 
                   title='Temperature Distribution',
                   labels={'temperature': 'Temperature'},
                  nbins=20,  # Adjust bins for better granularity
                   opacity=0.6,
                  color_discrete_sequence = px.colors.qualitative.Set2)

# Show figure
fig.show()


**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [280]:
Bar_Coupons = data.query("coupon == 'Bar'")[['coupon']].reset_index()
Bar_Coupons [['coupon']]

Unnamed: 0,coupon
0,Bar
1,Bar
2,Bar
3,Bar
4,Bar
...,...
2012,Bar
2013,Bar
2014,Bar
2015,Bar


2. What proportion of bar coupons were accepted?


In [304]:
# Correct query statement
Bar_df = df.query("Y == 1 and coupon == 'Bar'")[['coupon']].reset_index(drop=True)

    coupon
0      Bar
1      Bar
2      Bar
3      Bar
4      Bar
..     ...
822    Bar
823    Bar
824    Bar
825    Bar
826    Bar

[827 rows x 1 columns]


In [309]:
totalCoupon = df['coupon'].value_counts().sum()
totalCoupon

12684

In [312]:
ratio_Bar = (Bar_df.value_counts().sum())/totalCoupon
ratio_Bar

0.065200252286345

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [331]:
data.head(10)


Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
5,No Urgent Place,Friend(s),Sunny,80,6PM,Restaurant(<20),2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
6,No Urgent Place,Friend(s),Sunny,55,2PM,Carry out & Take away,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
7,No Urgent Place,Kid(s),Sunny,80,10AM,Restaurant(<20),2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
8,No Urgent Place,Kid(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


In [365]:
ThreeOrLessVisitBar_df = data.query("Y == 1 and coupon == 'Bar' and (Bar == '1~3' | Bar == 'less1')")[['Y','coupon','Bar']]

In [366]:
ThreeOrLessVisitBar_df.reset_index()[['Y','coupon','Bar']]

Unnamed: 0,Y,coupon,Bar
0,1,Bar,less1
1,1,Bar,less1
2,1,Bar,less1
3,1,Bar,1~3
4,1,Bar,1~3
...,...,...,...
505,1,Bar,1~3
506,1,Bar,1~3
507,1,Bar,1~3
508,1,Bar,less1


In [380]:
ThreeOrLessVisitBar_Count_df = ThreeOrLessVisitBar_df.value_counts().sum()
ThreeOrLessVisitBar_Count_df

510

In [376]:
MoreVisitBar_df = data.query("Y == 1 and coupon == 'Bar' and (Bar == '4~8' | Bar == 'gt8')")[['Y','coupon','Bar']]

In [377]:
MoreVisitBar_df.reset_index()[['Y','coupon','Bar']]

Unnamed: 0,Y,coupon,Bar
0,1,Bar,gt8
1,1,Bar,gt8
2,1,Bar,gt8
3,1,Bar,gt8
4,1,Bar,4~8
...,...,...,...
148,1,Bar,4~8
149,1,Bar,4~8
150,1,Bar,4~8
151,1,Bar,4~8


In [390]:
MoreVisitBar_Count_df = MoreVisitBar_df.value_counts().sum()
MoreVisitBar_Count_df

153

In [389]:
TotalCouponAccepted = df.query("Y == 1").value_counts().sum()
TotalCouponAccepted

7210

In [392]:
# Acceptance Rate who went to a bar 3 or less
Acceptance_Rate_A = (ThreeOrLessVisitBar_Count_df/TotalCouponAccepted)*100

In [394]:
# Acceptance Rate who went to a bar 4 or more
Acceptance_Rate_B = (MoreVisitBar_Count_df/TotalCouponAccepted)*100

In [393]:
Acceptance_Rate_A # Acceptance Rate who went to a bar 3 or less

7.073509015256588

In [395]:
Acceptance_Rate_B # Acceptance Rate who went to a bar 4 or more

2.1220527045769764

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [407]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

In [418]:
#drivers who go to a bar more than once a month and are over the age of 25

Driver_Set1_df = data.query("Y == 1 and (age != '21' and age != 'below21' ) and (Bar == '1~3' or Bar == '4~8' or Bar == 'gt8')")[['Y', 'age', 'Bar']]


In [419]:
Driver_Set1_df.reset_index()[['Y','age','Bar']]

Unnamed: 0,Y,age,Bar
0,1,26,1~3
1,1,26,1~3
2,1,26,1~3
3,1,26,1~3
4,1,26,1~3
...,...,...,...
1721,1,26,1~3
1722,1,26,1~3
1723,1,26,1~3
1724,1,26,1~3


In [421]:
Driver_Set1_df_counts = Driver_Set1_df.value_counts().sum()
Driver_Set1_df_counts

1726

In [413]:
TotalCouponAccepted = df.query("Y == 1").value_counts().sum()
TotalCouponAccepted

7210

In [441]:
# Acceptance Rate : Driver Group # 1

Driver_Acceptance_Rate_Set1 = (Driver_Set1_df_counts/TotalCouponAccepted) * 100
RoundedSet1 = Driver_Acceptance_Rate_Set1.round(2)
RoundedSet1

23.94

In [427]:
#Drivers who go to a bar less than once a month and are below the age of 25

Driver_Set2_df = data.query("Y == 1 and (age == '21' or age == 'below21' ) and (Bar == 'never' or Bar == 'less1')")[['Y', 'age', 'Bar']]
Driver_Set2_df.reset_index() [['Y', 'age', 'Bar']]

Unnamed: 0,Y,age,Bar
0,1,21,never
1,1,21,never
2,1,21,never
3,1,21,never
4,1,21,never
...,...,...,...
1226,1,21,never
1227,1,21,never
1228,1,21,never
1229,1,21,never


In [428]:
Driver_Set2_df_counts = Driver_Set2_df.value_counts().sum()
Driver_Set2_df_counts

1231

In [440]:
# Acceptance Rate : Driver Group # 2

Driver_Acceptance_Rate_Set2 = (Driver_Set2_df_counts/TotalCouponAccepted) * 100
RoundedSet2 = Driver_Acceptance_Rate_Set2.round(2)
RoundedSet2 

17.07

In [None]:
# Acceptance Rate is higher for drivers who go to a bar more than 
# once a month and are over the age of 25 compared to all others.

In [613]:

# Create a DataFrame
data = pd.DataFrame({
    'Category': ['Driver Group1', 'Driver Group2'],
    'Acceptance Rate (%)': [RoundedSet1, RoundedSet2],
    'Legend Description': ['Bar Visits > 1 per Month & Age greater than 25',
                           'Bar Visits < 1 per Month & Age under 25']
})

# Plot the chart
fig = px.bar(data, x='Category', y='Acceptance Rate (%)', 
             title='Comparison of Driver Acceptance Rates',
             text='Acceptance Rate (%)', 
             color='Legend Description', 
             color_discrete_sequence=px.colors.qualitative.Set2)

fig.update_traces(textposition='outside')

fig.update_layout(
    xaxis_tickangle=0,  
    margin=dict(b=80),  
    width=900,  
    height=600 
)

# Show figure
fig.show()


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry.


In [595]:
# Drivers who go to bars more than once a month and 
# had passengers that were NOT a kid and 
# had occupations other than farming, fishing, or forestry.

Driver_Set3_df = data.query("Y == 1 and passanger != 'Kid(s)' and occupation != 'Farming Fishing & Forestry' and (Bar == '1~3' or Bar == '4~8' or Bar == 'gt8')")[['Y', 'Bar', 'passanger', 'occupation']]
Driver_Set3_df.reset_index()[['Y', 'Bar', 'passanger', 'occupation']]

Unnamed: 0,Y,Bar,passanger,occupation
0,1,1~3,Friend(s),Student
1,1,1~3,Friend(s),Student
2,1,1~3,Friend(s),Student
3,1,1~3,Friend(s),Student
4,1,1~3,Friend(s),Student
...,...,...,...,...
2298,1,1~3,Friend(s),Food Preparation & Serving Related
2299,1,1~3,Alone,Food Preparation & Serving Related
2300,1,1~3,Alone,Food Preparation & Serving Related
2301,1,1~3,Alone,Food Preparation & Serving Related


In [497]:
Driver_Set3_df_counts = Driver_Set3_df.value_counts().sum()
Driver_Set3_df_counts

2303

In [498]:
# All others:
# Drivers who go to bars less than once a month and 
# had passengers that were kid(s) and 
# had occupations farming, fishing, or forestry.

Driver_Set4_df = data.query("Y == 1 and passanger == 'Kid(s)' and occupation == 'Farming Fishing & Forestry' and (Bar == 'never' or Bar == 'less1')")[['Y', 'Bar', 'passanger', 'occupation']]
Driver_Set4_df.reset_index()[['Y', 'Bar', 'passanger', 'occupation']]

Unnamed: 0,Y,Bar,passanger,occupation
0,1,never,Kid(s),Farming Fishing & Forestry
1,1,less1,Kid(s),Farming Fishing & Forestry
2,1,less1,Kid(s),Farming Fishing & Forestry
3,1,less1,Kid(s),Farming Fishing & Forestry
4,1,less1,Kid(s),Farming Fishing & Forestry


In [499]:
Driver_Set4_df_counts = Driver_Set4_df.value_counts().sum()
Driver_Set4_df_counts

5

In [502]:
# Total Coupons Accepeted by Drivers
TotalCouponAccepted = df.query("Y == 1").value_counts().sum()
TotalCouponAccepted

7210

In [598]:
# Acceptance Rate : Driver Group # 3

Driver_Acceptance_Rate_Set3 = (Driver_Set3_df_counts/TotalCouponAccepted) * 100
RoundedSet3 = Driver_Acceptance_Rate_Set3.round(2)
RoundedSet3 


31.94

In [504]:
# Acceptance Rate : Driver Group # 4

Driver_Acceptance_Rate_Set4 = (Driver_Set4_df_counts/TotalCouponAccepted) * 100
RoundedSet4 = Driver_Acceptance_Rate_Set4.round(2)
RoundedSet4 

0.07

In [615]:
# Comparison

# Create a DataFrame
data = pd.DataFrame({
    'Category': ['Driver Group 3', 'Driver Group 4'],
    'Acceptance Rate (%)': [RoundedSet3, RoundedSet4],
    'Legend Description': [
        'Bar Visits > 1 per Month & <br> Passenger NOT Kid(s) <br> & occupation other than <br> farming, fishing <br> or forestry',
        'All others : Bar Visits < 1 per Month & <br> Passenger ARE Kid(s) <br> & occupation is <br> farming, fishing <br> or forestry']
})

# Create bar plot using Plotly
fig = px.bar(data, x='Category', y='Acceptance Rate (%)', 
             title='Comparison of Driver Acceptance Rates',
             text='Acceptance Rate (%)', 
             color='Legend Description',  # Use long text in legend
             color_discrete_sequence=['blue', 'green'])

# Adjust text positions & layout
fig.update_traces(textposition='outside')

fig.update_layout(
    xaxis_tickangle=0,  # Keep x-axis labels straight since they are now short
    margin=dict(b=80),  # Reduce bottom margin since labels are short
    width=1000,  # Set figure width
    height=600  # Set figure height
)


fig.show()

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K.



In [505]:
# Drivers who go to bars more than once a month and 
# had passengers that were NOT a kid and 
# and were not widowed 

Driver_Set5_df = data.query("Y == 1 and passanger != 'Kid(s)' and maritalStatus != 'Widowed' and (Bar == '1~3' or Bar == '4~8' or Bar == 'gt8')")[['Y', 'Bar', 'passanger', 'maritalStatus']]

In [577]:
Driver_Set5_df.reset_index()[['Y', 'Bar', 'passanger', 'maritalStatus']]

Unnamed: 0,Y,Bar,passanger,maritalStatus
0,1,1~3,Friend(s),Single
1,1,1~3,Friend(s),Single
2,1,1~3,Friend(s),Single
3,1,1~3,Friend(s),Single
4,1,1~3,Friend(s),Single
...,...,...,...,...
2298,1,1~3,Friend(s),Unmarried partner
2299,1,1~3,Alone,Unmarried partner
2300,1,1~3,Alone,Unmarried partner
2301,1,1~3,Alone,Unmarried partner


In [588]:
Driver_Set5_df_counts = Driver_Set5_df.value_counts().sum()
Driver_Set5_df_counts

2303

In [642]:
# Drivers who go to bars more than once a month and are under the age of 30

Driver_Set6_df = data.query("Y == 1 and (age == '21' or age == 'below21' or age == '26') and (Bar == '1~3' or Bar == '4~8' or Bar == 'gt8')")[['Y', 'Bar', 'age']]

In [643]:
Driver_Set6_df.reset_index()[['Y', 'Bar', 'age']]

Unnamed: 0,Y,Bar,age
0,1,1~3,21
1,1,1~3,21
2,1,1~3,21
3,1,1~3,21
4,1,1~3,21
...,...,...,...
1422,1,1~3,21
1423,1,1~3,21
1424,1,1~3,21
1425,1,1~3,21


In [644]:
Driver_Set6_df_counts = Driver_Set6_df.value_counts().sum()
Driver_Set6_df_counts

1427

In [522]:
# go to cheap restaurants more than 4 times a month and income is less than 50K.

Driver_Set7_df = data.query("Y == 1 and (RestaurantLessThan20 == '4~8' or RestaurantLessThan20 == 'gt8') and (income == '$25000 - $37499' or income == '$12500 - $24999' or income == '$37500 - $49999')")[['Y', 'RestaurantLessThan20', 'income']]

In [523]:
Driver_Set7_df.reset_index()[['Y', 'RestaurantLessThan20', 'income']]

Unnamed: 0,Y,RestaurantLessThan20,income
0,1,4~8,$37500 - $49999
1,1,4~8,$37500 - $49999
2,1,4~8,$37500 - $49999
3,1,4~8,$37500 - $49999
4,1,4~8,$37500 - $49999
...,...,...,...
1076,1,4~8,$12500 - $24999
1077,1,4~8,$12500 - $24999
1078,1,4~8,$12500 - $24999
1079,1,4~8,$12500 - $24999


In [579]:
Driver_Set7_df_counts = Driver_Set7_df.value_counts().sum()
Driver_Set7_df_counts

1081

In [589]:
# Acceptance Rate : Driver Group # 5

Driver_Acceptance_Rate_Set5 = (Driver_Set5_df_counts/TotalCouponAccepted) * 100
RoundedSet5 = Driver_Acceptance_Rate_Set5.round(2)
RoundedSet5 

31.94

In [590]:
# Acceptance Rate : Driver Group # 6

Driver_Acceptance_Rate_Set6 = (Driver_Set6_df_counts/TotalCouponAccepted) * 100
RoundedSet6 = Driver_Acceptance_Rate_Set6.round(2)
RoundedSet6 

19.79

In [591]:
# Acceptance Rate : Driver Group # 7

Driver_Acceptance_Rate_Set7 = (Driver_Set7_df_counts/TotalCouponAccepted) * 100
RoundedSet7 = Driver_Acceptance_Rate_Set7.round(2)
RoundedSet7 

14.99

In [614]:
# Comparison:
# Group 5: go to bars more than once a month, had passengers that were not a kid, and were not widowed OR
# Group 6: go to bars more than once a month and are under the age of 30 OR
# Group 7: go to cheap restaurants more than 4 times a month and income is less than 50K

# Create DataFrame 
data = pd.DataFrame({
    'Category': ['Group 5', 'Group 6', 'Group 7'],  
    'Acceptance Rate (%)': [RoundedSet5, RoundedSet6, RoundedSet7],
    'Legend Description': [
        'Bar Visits > 1 per Month & <br> Passengers - NOT Kid & Not Widowed',
        'Bar Visits > 1 per Month & <br> Age under 30',
        'Cheap Restaurant Visits < 4 times per month & <br> Income less than 50K'
    ] 
})

# Create bar plot using Plotly
fig = px.bar(data, x='Category', y='Acceptance Rate (%)', 
             title='Comparison of Driver Acceptance Rates',
             text='Acceptance Rate (%)', 
             color='Legend Description',  # Use long text in legend
             color_discrete_sequence=px.colors.qualitative.Pastel1)


fig.update_traces(textposition='outside')

fig.update_layout(
    xaxis_tickangle=0,  
    margin=dict(b=80),  
    width=900,  
    height=600 
)

# Show figure
fig.show()


7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [633]:
# Create DataFrame 
data = pd.DataFrame({
    'Category': ['Group 1','Group 2', 'Group 3','Group 4', 'Group 5', 'Group 6','Group 7'],  
    'Acceptance Rate (%)': [RoundedSet1, RoundedSet2, RoundedSet3, RoundedSet4, RoundedSet5, RoundedSet6, RoundedSet7],
    'Legend Description': [ 'Bar Visits > 1 per Month and Age <br> greater than 25<br>',
                            'Bar Visits < 1 per month and Age under 25<br>',
                            'Bar Visits > 1 per Month & Passengers<br> NOT Kid(s) & Driver occupation <br> other than <br> farming, fishing or<br> forestry<br>',
                            'Bar Visits < 1 per Month & <br> Passengers ARE Kid(s) <br> & Driver occupation is farming, fishing <br>',
                            'Bar Visits > 1 per Month & <br> Passengers - NOT Kid & Drivers<br> Not Widowed<br>',
                            'Bar Visits > 1 per Month & <br> Age under 30 <br>',
                            'Cheap Restaurant Visit < 4 times per month & <br> and Income < 50K'] 
        })

# Create bar plot using Plotly
fig = px.bar(data, x='Category', y='Acceptance Rate (%)', 
             title='Comparison of Driver Acceptance Rates',
             text='Acceptance Rate (%)', 
             color='Legend Description',  
             color_discrete_sequence=px.colors.qualitative.Set2)


fig.update_traces(textposition='outside')

fig.update_layout(
    xaxis_tickangle=0,  
    margin=dict(b=80),  
    width=900,  
    height=700 
)

# Show figure
fig.show()

<!-- # Based on the above chart, I can infer what influences a driver's likelihood of accepting a bar coupon. 
Here's my hypothesis about drivers who will be accepting the bar coupons:

1. FREQUENT BAR GOERS ARE MOST LIKELY TO ACCEPT BAR COUPONS: 
Based on Driver Group 3 (31.94 %) and 5 (31.94 %), drivers who already visit bars frequenyly or greater than once a month, and who do NOT have Kids as passengers are MORE LIKELY TO ACCEPT bar related coupons, as it aligns with their leisure activities. 
2. Based on Driver Group 4, Drivers who have children as passengers might be less likely to accept bar coupons, as they are more likely to choose family-friendly destinations over bars.
3. Based on Driver Group 3 & 4, Drivers in rural occupations (farming, fishing, forestry) might not frequent bars as much, possibly due to less accessibility, cultural factors, work schedules or healthy habits.
4.Based on Driver Group 6, Younger drivers perhaps are more open to bar visits and drinking as a social activity, making them more likely to accept a bar coupon. 
5. Based on Driver Group 7, Drivers with lower income seem more budget-conscious and are more likely to spend money at cheap restaurants. This also hyposthesize that they are less likely to spend money at bars, even with a bar coupon. They may prioritize essential spending over leisure activities.


### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

In [656]:
# Drivers who go to CoffeeHouse more than once a month and are Single and are under the age of 30


Driver_Set8_df = data.query("Y == 1 and (age == '21' or age == 'below21' or age == '26') and maritalStatus == 'Single' and (CoffeeHouse  == '1~3' or CoffeeHouse  == '4~8' or CoffeeHouse == 'gt8')")[['Y', 'CoffeeHouse', 'age' , 'maritalStatus']]
Driver_Set8_df.reset_index()[['Y', 'CoffeeHouse', 'age', 'maritalStatus']]

Unnamed: 0,Y,CoffeeHouse,age,maritalStatus
0,1,gt8,26,Single
1,1,gt8,26,Single
2,1,gt8,26,Single
3,1,gt8,26,Single
4,1,gt8,26,Single
...,...,...,...,...
1115,1,gt8,26,Single
1116,1,gt8,26,Single
1117,1,gt8,26,Single
1118,1,gt8,26,Single


In [684]:
Driver_Set8_df_counts = Driver_Set8_df.value_counts().sum()
Driver_Set8_df_counts

1120

In [685]:
# Acceptance Rate : Driver Group # 8
Driver_Acceptance_Rate_Set8 = (Driver_Set8_df_counts/TotalCouponAccepted) * 100
RoundedSet8 = Driver_Acceptance_Rate_Set8.round(2)
RoundedSet8 

15.53

In [689]:
# Drivers who go to CoffeeHouse less than once a month and are NOT Single and are over the age of 30

Driver_Set9_df = data.query("Y == 1 and (age != '21' or age != 'below21' or age != '26') and maritalStatus != 'Single' and (CoffeeHouse  == 'less1')")[['Y', 'CoffeeHouse', 'age' , 'maritalStatus']]
Driver_Set9_df.reset_index()[['Y', 'CoffeeHouse', 'age', 'maritalStatus']]

Unnamed: 0,Y,CoffeeHouse,age,maritalStatus
0,1,less1,26,Married partner
1,1,less1,26,Married partner
2,1,less1,26,Married partner
3,1,less1,26,Married partner
4,1,less1,26,Married partner
...,...,...,...,...
1195,1,less1,50plus,Divorced
1196,1,less1,50plus,Divorced
1197,1,less1,50plus,Divorced
1198,1,less1,50plus,Divorced


In [690]:
Driver_Set9_df_counts = Driver_Set9_df.value_counts().sum()
Driver_Set9_df_counts

1200

In [691]:
# Acceptance Rate : Driver Group # 9
Driver_Acceptance_Rate_Set9 = (Driver_Set9_df_counts/TotalCouponAccepted) * 100
RoundedSet9 = Driver_Acceptance_Rate_Set9.round(2)
RoundedSet9 

16.64

In [692]:
# Drivers who go to CoffeeHouse less than once a month and are NOT Single and are below the age of 30


Driver_Set10_df = data.query("Y == 1 and (age == '21' or age == 'below21' or age == '26') and maritalStatus != 'Single' and (CoffeeHouse  == 'less1')")[['Y', 'CoffeeHouse', 'age' , 'maritalStatus']]
Driver_Set10_df.reset_index()[['Y', 'CoffeeHouse', 'age', 'maritalStatus']]

Unnamed: 0,Y,CoffeeHouse,age,maritalStatus
0,1,less1,26,Married partner
1,1,less1,26,Married partner
2,1,less1,26,Married partner
3,1,less1,26,Married partner
4,1,less1,26,Married partner
...,...,...,...,...
359,1,less1,26,Married partner
360,1,less1,26,Married partner
361,1,less1,26,Married partner
362,1,less1,26,Married partner


In [693]:
Driver_Set10_df_counts = Driver_Set10_df.value_counts().sum()
Driver_Set10_df_counts

364

In [694]:
# Acceptance Rate : Driver Group # 10
Driver_Acceptance_Rate_Set10 = (Driver_Set10_df_counts/TotalCouponAccepted) * 100
RoundedSet10 = Driver_Acceptance_Rate_Set10.round(2)
RoundedSet10

5.05

In [695]:
# Drivers who go to CoffeeHouse more than once a month and are NOT Single and are over the age of 30

Driver_Set11_df = data.query("Y == 1 and (age != '21' or age != 'below21' or age != '26') and maritalStatus != 'Single' and (CoffeeHouse  == '1~3' or CoffeeHouse  == '4~8' or CoffeeHouse == 'gt8')")[['Y', 'CoffeeHouse', 'age' , 'maritalStatus']]
Driver_Set11_df.reset_index()[['Y', 'CoffeeHouse', 'age', 'maritalStatus']]

Unnamed: 0,Y,CoffeeHouse,age,maritalStatus
0,1,1~3,46,Married partner
1,1,1~3,46,Married partner
2,1,1~3,46,Married partner
3,1,1~3,46,Married partner
4,1,1~3,46,Married partner
...,...,...,...,...
2240,1,4~8,21,Unmarried partner
2241,1,4~8,21,Unmarried partner
2242,1,4~8,21,Unmarried partner
2243,1,4~8,21,Unmarried partner


In [697]:
Driver_Set11_df_counts = Driver_Set11_df.value_counts().sum()
Driver_Set11_df_counts

2245

In [698]:
# Acceptance Rate : Driver Group # 11
Driver_Acceptance_Rate_Set11 = (Driver_Set11_df_counts/TotalCouponAccepted) * 100
RoundedSet11 = Driver_Acceptance_Rate_Set11.round(2)
RoundedSet11

31.14

In [705]:
## Drivers who go to CoffeeHouse less than once a month and are Single and are below the age of 30

Driver_Set12_df = data.query("Y == 1 and (age == '21' or age == 'below21' or age == '26') and maritalStatus == 'Single' and (CoffeeHouse  == 'less1')")[['Y', 'CoffeeHouse', 'age' , 'maritalStatus']]
Driver_Set12_df.reset_index()[['Y', 'CoffeeHouse', 'age', 'maritalStatus']]


Unnamed: 0,Y,CoffeeHouse,age,maritalStatus
0,1,less1,21,Single
1,1,less1,21,Single
2,1,less1,21,Single
3,1,less1,21,Single
4,1,less1,21,Single
...,...,...,...,...
431,1,less1,26,Single
432,1,less1,26,Single
433,1,less1,26,Single
434,1,less1,26,Single


In [706]:
Driver_Set12_df_counts = Driver_Set12_df.value_counts().sum()
Driver_Set12_df_counts

436

In [707]:
# Acceptance Rate : Driver Group # 12
Driver_Acceptance_Rate_Set12 = (Driver_Set12_df_counts/TotalCouponAccepted) * 100
RoundedSet12 = Driver_Acceptance_Rate_Set12.round(2)
RoundedSet12

6.05

In [714]:
# Create DataFrame 
data = pd.DataFrame({
    'Category': ['Group 8','Group 9','Group 10', 'Group 11', 'Group12'],  
    'Acceptance Rate (%)': [RoundedSet8, RoundedSet9,RoundedSet10,RoundedSet11,RoundedSet12],
    'Legend Description': [ 'CoffeeHouse Visits > 1 per Month and <br> Age below 30 and Single<br>',
                            'CoffeeHouse Visits < 1 per month and <br> Age over 30 and Not Single<br>',
                            'CoffeeHouse Visits < 1 per month and <br> Age below 30 and Not Single<br>',
                            'CoffeeHouse Visits > 1 per month and <br> Age over30 and Not Single<br>',
                            'CoffeeHouse Visits < 1 per month and <br> Age below 30 and Single<br>'
                          ] 
        })

# Create bar plot using Plotly
fig = px.bar(data, x='Category', y='Acceptance Rate (%)', 
             title='Comparison of Driver Acceptance Rates',
             text='Acceptance Rate (%)', 
             color='Legend Description', 
             color_discrete_sequence=px.colors.qualitative.Set2)


fig.update_traces(textposition='outside')

fig.update_layout(
    xaxis_tickangle=0,  
    margin=dict(b=80),  
    width=900,  
    height=700 
)

# Show figure
fig.show()

<!-- Hypothessis: 

1. Group 11: CoffeeHouse Visits > 1 per Month, Age > 30, Not Single) has the highest acceptance rate at 31.14%. Older individuals who are not Single and frequently visit coffee houses are HIGHLY RECEPTIVE to COFFEE coupons. They value their coffee consumption habits over drinking alcoholic beverages on their social gatherings. They probably enjoy coffee during breakfast with family than night outs.

2.Group 8: (CoffeeHouse Visits > 1 per Month, Age < 30, Single) has an acceptance rate of 15.53%. Young and single individuals who frequently visit coffeehouses may be SOMEWHAT MOTIVATED by coffee coupons BUT not as much as older, frequent visitors.They may already buy coffee regularly and don’t feel a strong need for discounts. They may be inclined towards specific brands or premium drinks that are likely NOT COVERED by the coupon.

Group 10 and Group 12: Group 10 (CoffeeHouse Visits < 1 per Month, Age < 30, Not Single) has the lowest acceptance rate at 5.05%.Group 12 (CoffeeHouse Visits < 1 per Month, Age < 30, Single) also has a low acceptance rate of 6.05%. They are less likely to visit coffeehouses and coffee coupons may not be enticing enough to change their drinking habits. Also, they may not be the morning people, and may prefer other beverages (alcoholic or non-alcoholic) over coffee.

Group 9 (CoffeeHouse Visits < 1 per Month, Age > 30, Not Single) has a 16.64% acceptance rate. 
Even though these individuals do not visit coffeehouses frequently, their COUPON ACCEPTANCE RATE is still HIGHER than younger rare visitors.They may view coffee outings as a special occasion and appreciate a discount.
Being NOT SINGLE, they might see coffee as a social or professional activity.

 
