### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [587]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import math

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [588]:
data = pd.read_csv('data/coupons.csv')

In [589]:
print(data.info())
data.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
5,No Urgent Place,Friend(s),Sunny,80,6PM,Restaurant(<20),2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
6,No Urgent Place,Friend(s),Sunny,55,2PM,Carry out & Take away,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
7,No Urgent Place,Kid(s),Sunny,80,10AM,Restaurant(<20),2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
8,No Urgent Place,Kid(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [590]:
print(data.shape)
data.isnull().sum()

(12684, 26)


destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

3. Decide what to do about your missing data -- drop, replace, other...

In [591]:
data = pd.read_csv('data/coupons.csv')

# there's a typo in the `passanger` column so we'll rename it
print('Renaming column: passanger -> passenger...','\n')
data.rename(columns={'passanger': 'passenger'}, inplace=True)

# analysing the missing values
percent_missing = data.isnull().sum() * 100 / len(data)
missing_value_df = pd.DataFrame({'column_name': data.columns,
                                 'percent_missing': percent_missing})
missing_value_df.sort_values('percent_missing', inplace=True)
print('Missing values precentage by column:\n\n', missing_value_df.query('percent_missing > 0'),'\n')
    
print('Cleaning data...','\n')

# Dropping the "car" column since it missing values at rate of 99.15%
print('Droping "car" column...','\n')
data.drop('car', axis=1, inplace=True)

# Replacing the nulls in columns that have 0.8-1.71% missing values with the most popular categorical value
print('Replacing nulls with the most popular categorical value for the other columns that contain nulls...','\n')
for column in ['Bar', 'CoffeeHouse', 'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50']:
    data[column].fillna(data[column].value_counts().idxmax(), inplace=True)

print(f'Null values after data cleanup: {data.isnull().sum().sum()}')

Renaming column: passanger -> passenger... 

Missing values precentage by column:

                                column_name  percent_missing
Bar                                    Bar         0.843582
RestaurantLessThan20  RestaurantLessThan20         1.024913
CarryAway                        CarryAway         1.190476
Restaurant20To50          Restaurant20To50         1.490066
CoffeeHouse                    CoffeeHouse         1.710817
car                                    car        99.148534 

Cleaning data... 

Droping "car" column... 

Replacing nulls with the most popular categorical value for the other columns that contain nulls... 

Null values after data cleanup: 0


I dropped the "car" column since it missing values at `99.15%` rate.
For columns that have `0.8-1.71%` missing values, I replaced the nulls with the most popular categorical value.

4. What proportion of the total observations chose to accept the coupon? 

In [592]:
print(f'p={(data.Y == 1).sum()/len(data)}')

p=0.5684326710816777


5. Use a bar plot to visualize the `coupon` column.

Histogram and bar plots in plotly are interchangable in Plotly (<a href="https://plotly.com/python/bar-charts/#:~:text=px.bar%20and%20px.histogram%20are%20designed%20to%20be%20nearly%20interchangeable%20in%20their%20call%20signatures%2C%20so%20as%20to%20be%20able%20to%20switch%20between%20aggregated%20and%20disaggregated%20bar%20representations.">explanation</a>), so I used a histogram to plot the aggregated data in bar like plot.


In [593]:
fig = px.histogram(data, x='coupon')
fig.update_layout(title='Coupon Categories (ascending)', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Category', 
                   width=600, height=500,
                   showlegend=True,
                   xaxis={'categoryorder':'total ascending'},
                   barmode='group') 
fig.update_traces(marker_color='blue').update_xaxes(type='category')
fig.show()

6. Use a histogram to visualize the temperature column.

In [594]:
fig = px.histogram(data, x='temperature').update_xaxes(type='category')
fig.update_layout(title='Temperature', title_x=0.5, title_y=0.95,
                   yaxis_title='Count', 
                   xaxis_title='Temperature (F°)', 
                   width=600, height=500,
                   showlegend=False).show()

I added a histogram with a break down of acceptance/rejection of coupons for the temperature variable.

In [595]:
fig = px.histogram(data, x='temperature', color='Y', labels={'Y': 'Coupon Acceptance'})
fig.update_layout(title='Coupon Acceptance vs Temperature (F°)', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Temperature (F°)',\
                   width=600, height=500,
                   showlegend=True,
                   xaxis={'categoryorder':'total ascending'},
                   barmode='group')
newnames = {'1':'Accepted', '0': 'Not Accepted'}
fig.for_each_trace(lambda t: t.update(name = newnames[t.name],
                                      legendgroup = newnames[t.name],
                                      hovertemplate = t.hovertemplate.replace(t.name, newnames[t.name])))
fig.update_xaxes(type='category')
fig.show()

In [596]:
# helper function to generate ratio acceptance DataFrame
def get_acceptance_dataframe(mydf, category):
    df_all = mydf.groupby(category).agg('count')['Y']
    df_category_accepted = mydf.query('Y == 1').groupby(category).agg('count')['Y']
    df_category_rejected = mydf.query('Y == 0').groupby(category).agg('count')['Y']
    category_accepted_ratio = df_category_accepted / df_all
    category_rejected_ratio = 1 - category_accepted_ratio
    categoty_acceptance_frame = {'Accepted': category_accepted_ratio,
         'Not Accepted': category_rejected_ratio}
    cat_frame = pd.DataFrame(categoty_acceptance_frame) 
    return cat_frame

In [597]:
df_accept_temperature = get_acceptance_dataframe(data, 'temperature')
fig = px.bar(df_accept_temperature, x=df_accept_temperature.index, y=['Accepted',  'Not Accepted'], labels={'variable': 'Coupon Acceptance'}).update_xaxes(type='category')
fig.add_hline(y=0.56, line_width=3, line_dash="dash", line_color="black",  annotation_text="Aggregated aceptance<br>rate baseline (0.56)", 
                            annotation_position="right")
fig.update_layout(title='Coupon Acceptance Rate vs Temperature (F°)', title_x = 0.45, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis={'categoryorder':'sum ascending'},
              xaxis_title = 'Temperature (F°)')
fig.show()


<span style='font-size:20px;'>&#128161;</span> Coupons were most offered on warmer days (`80F°`). We can also see that coupons that were offered at this warmer temperature were accepted more frequently (`60%`) by drivers, compared to oupons that were offered at colder days (`55F°` and `30F°`) that produces acceptance rate of `53.6%`, `53.1%`, respectively. 

### Investigating the Bar Coupons

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [598]:
df_bar_coupons = data.query('coupon == "Bar"')
df_grouped = df_bar_coupons.groupby('Y')
df_grouped.head(50)

Unnamed: 0,destination,passenger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,...,never,1~3,4~8,1~3,1,1,0,0,1,0
13,Home,Alone,Sunny,55,6PM,Bar,1d,Female,21,Unmarried partner,...,never,1~3,4~8,1~3,1,0,0,1,0,1
17,Work,Alone,Sunny,55,7AM,Bar,1d,Female,21,Unmarried partner,...,never,1~3,4~8,1~3,1,1,1,0,1,0
24,No Urgent Place,Friend(s),Sunny,80,10AM,Bar,1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,0,1,1
35,Home,Alone,Sunny,55,6PM,Bar,1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
672,Work,Alone,Sunny,55,7AM,Bar,1d,Male,31,Married partner,...,never,4~8,1~3,1~3,1,1,1,0,1,1
686,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Male,36,Married partner,...,1~3,4~8,1~3,1~3,1,1,0,0,1,0
690,Home,Alone,Sunny,55,6PM,Bar,1d,Male,36,Married partner,...,1~3,4~8,1~3,1~3,1,0,0,1,0,1
694,Work,Alone,Sunny,55,7AM,Bar,1d,Male,36,Married partner,...,1~3,4~8,1~3,1~3,1,1,1,0,1,0


Helper functions

In [599]:
# calculates the acceptance rate given a dataframe
def get_acceptance_rate(df):
    return len(df.query('Y == 1')) / len(df)

# generates a Plotly bar plot in order to make it easier to plot & keeping the code more concise 
def generate_fig_plotly(x, y, title='Bar Promo Acceptance Rate Analysis By Groups', yaxis_title='Acceptance Rate', color=None, w=600, h=600, df=None, plot_line=None, plot_line_value=None, plot_line_text=None):
    if((color is not None) & (df is not None)):
        fig = px.bar(df, x = x, y = y, text = ['%.3f'%x for x in y], color=color)
    else:
        fig = px.bar(x = x, y = y, text = ['%.3f'%x for x in y])
    
    if (plot_line is not None):
        fig.add_hline(y=plot_line_value, line_width=3, line_dash="dash", line_color="black",  annotation_text=plot_line_text, 
                            annotation_position="right")
    
        
    fig.update_layout(title=title, title_x = 0.5, title_y = 0.95,
                      width=w, height=h, 
                      yaxis_title = yaxis_title,
                      xaxis_title = 'Groups')
    fig.show()

2. What proportion of bar coupons were accepted?

In [600]:
accepted_rate = get_acceptance_rate(df_bar_coupons)
print(f'proportion of bar coupons:\n\tAccepted:     {accepted_rate}\n\tNot accepted: {1 - accepted_rate}')

generate_fig_plotly(x=('Accepted', 'Didn\'t Accept'), y=[accepted_rate, 1 - accepted_rate], title='Bar Promo Acceptance Rate', yaxis_title='Rate')

proportion of bar coupons:
	Accepted:     0.41001487357461575
	Not accepted: 0.5899851264253843


<span style='font-size:20px;'>&#128161;</span>We can see in that bar coupons were accepted at a rate (`41%`) which as significantly less tham the aggregated acceptance rate  (`56.8%`) of all coupons.

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.

In [601]:
bar_accepted_less_than_3 = len(df_bar_coupons.query('Bar in ("never", "less1", "1~3")').query('Y == 1')) / len(df_bar_coupons.query('Bar in ("never", "less1", "1~3")')) 
bar_accepted_more_than_3 = len(df_bar_coupons.query('Bar in ("4~8","gt8")').query('Y == 1')) / len(df_bar_coupons.query('Bar in ("4~8","gt8")'))

generate_fig_plotly(x=('Went to Bars <= 3 Times a Month', 'Went to Bars > 3 Times a Month'), 
             y=(bar_accepted_less_than_3, bar_accepted_more_than_3), 
             title='Bar Promo Acceptance Rate vs Times Went to Bars',
             plot_line=True, plot_line_value=0.41, plot_line_text='<br>aggregated<br>rate (0.41)')

<span style='font-size:20px;'>&#128161;</span> Drivers that went to bars more than 3 times a month, accepted bar coupons more frequently (`76.8%`) than drivers that went to bars less than 3 times, who presented acceptance rate of (`37%`).

I decided to also compare the acceptance rate between those who went to a bar less than once a month to those who went more, since 'less than once in a month' category is used in the next questions.

In [602]:
bar_accepted_less_than_1 = len(df_bar_coupons.query('Bar in ("never", "less1")').query('Y == 1')) / len(df_bar_coupons.query('Bar in ("never", "1~3")')) 
bar_accepted_more_than_1 = len(df_bar_coupons.query('Bar in ("1~3", "4~8","gt8")').query('Y == 1')) / len(df_bar_coupons.query('Bar in ("1~3", "4~8","gt8")'))

generate_fig_plotly(x=('Went to Bars < Once a Month', 'Went to Bars >= Once a Month'), 
             y=(bar_accepted_less_than_1, bar_accepted_more_than_1), 
             title='Bar Promo Acceptance Rate vs Times Went to Bars',
             plot_line=True, plot_line_value=0.41, plot_line_text='<br>aggregated<br>rate (0.41)')

<span style='font-size:20px;'>&#128161;</span> Drivers that went to bars more than once a month, accepted bar coupons more frequently (`68.7%`) than drivers that went to bars less than 1 time (`33.4%`).

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [603]:
above_25_and_more_than_1_bar_visit_acceptance = get_acceptance_rate(df_bar_coupons.query('(Bar in ("1~3", "4~8", "gt8")) and (age not in ("21","below21"))'))
the_rest_acceptance = get_acceptance_rate(df_bar_coupons.query('(Bar not in ("1~3", "4~8", "gt8")) or (age in ("21","below21"))'))

below_21_and_more_than_1_bar_visit_acceptance = get_acceptance_rate(df_bar_coupons.query('(Bar in ("1~3", "4~8", "gt8")) and (age in ("21","below21"))'))
print(below_21_and_more_than_1_bar_visit_acceptance)
print(f'Bar promo acceptance rate of those who went to a bar more than once a month and their age older than 25:\n\tAcceptance Rate: ' +
      f'{"%.3f"% above_25_and_more_than_1_bar_visit_acceptance}\n\tAll others:      {"%.3f"% the_rest_acceptance}\n' +
      '\nWe can definitely see that Bar coupons were accepted had higher rate for older people that went to the bar more\nthan once a month.')

generate_fig_plotly(['Age > 25 AND<br>Went to Bar >= Once a Month', 'Age < 25 OR<br>Went to Bar < Once a Month'],
               [above_25_and_more_than_1_bar_visit_acceptance, the_rest_acceptance], plot_line=True, plot_line_value=0.41, plot_line_text='<br>aggregated<br>rate (0.41)')

0.6704545454545454
Bar promo acceptance rate of those who went to a bar more than once a month and their age older than 25:
	Acceptance Rate: 0.695
	All others:      0.335

We can definitely see that Bar coupons were accepted had higher rate for older people that went to the bar more
than once a month.


<span style='font-size:20px;'>&#128161;</span> Drivers that went to bars more than once a month and are older than 25, accepted bar coupons more frequently (`69.5%`) than drivers that went to bars less than 1 time or were younger than 25 (`33.5%`).

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [604]:
# identifying driver groups that went to the bar more than once & drivers below 21 (included)
filter_criteria = df_bar_coupons.query('(Bar in ["1~3", "4~8", "gt8"]) and (passenger in ["Friend(s)", "Partner"]) and (occupation != "Farming Fishing & Forestry")')
the_rest = df_bar_coupons.query('(Bar not in ["1~3", "4~8", "gt8"]) or (passenger not in ["Friend(s)", "Partner"]) or (occupation == "Farming Fishing & Forestry")')
filter_criteria_acceptance = get_acceptance_rate(filter_criteria)
the_rest_acceptance = get_acceptance_rate(the_rest)

print(f'Bar promo acceptance rate for drivers that:\n  * Went to bars more than once a month AND\n  * Had a paaganer that was not kid     AND\n  * Had occupations other than farming, fishing, or forestry\n\n\tAcceptance rate (for the group above): {"%.3f"% filter_criteria_acceptance}\n' +
    f'\tAcceptance rate for all others: {"%.3f"% the_rest_acceptance}')

generate_fig_plotly(x=['Go to Bars >= Once a Month AND<br>Have Adult Passengers AND<br>Occupation is Not Farming Fishing & Forestry', 'The Rest'], 
                    y=[filter_criteria_acceptance, the_rest_acceptance], plot_line=True, plot_line_value=0.41, plot_line_text='<br>aggregated<br>rate (0.41)', w=600)

Bar promo acceptance rate for drivers that:
  * Went to bars more than once a month AND
  * Had a paaganer that was not kid     AND
  * Had occupations other than farming, fishing, or forestry

	Acceptance rate (for the group above): 0.718
	Acceptance rate for all others: 0.377


<span style='font-size:20px;'>&#128161;</span> Drivers that went to bars more than once a month and had adult passengers and were not working in farmin/fishing/forestry, accepted bar coupons more frequently (`71.8%`) than the rest of the drivers (`37.7%`).

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [605]:
filter_criteria_1 = df_bar_coupons.query('(Bar in ["1~3", "4~8", "gt8"]) and (passenger in ["Friend(s)", "Partner"]) and (maritalStatus != "Widowed")')
the_rest_1 = df_bar_coupons.query('(Bar not in ["1~3", "4~8", "gt8"]) or (passenger not in ["Friend(s)", "Partner"]) or (maritalStatus == "Widowed")')
print(f'Bar promo acceptance rate for filter criteria 1:'
      f'\n  * Went to bars more than once a month'
      f'\n  * Had a paaganer that was not kid'
      f'\n  * Had marital status other than widowed'
      f'\n\n\tAcceptance rate (for the group above): {"%.3f"% get_acceptance_rate(filter_criteria_1)}\n'
      f'\tAcceptance rate for all others: {"%.3f"% get_acceptance_rate(the_rest_1)}\n')

filter_criteria_2 = df_bar_coupons.query('(Bar in ["1~3", "4~8", "gt8"]) and (age in ["21", "26", "below21"])')
the_rest_2 = df_bar_coupons.query('(Bar not in ["1~3", "4~8", "gt8"]) or (age not in ["21", "26", "below21"])')
print(f'Bar promo acceptance rate for filter criteria 2:'
      f'\n  * Went to bars more than once a month'
      f'\n  * Age under 30\n\n\tAcceptance rate (for the group above): {"%.3f"% get_acceptance_rate(filter_criteria_2)}\n'
      f'\tAcceptance rate for all others: {"%.3f"% get_acceptance_rate(the_rest_2)}\n')

filter_criteria_3 = df_bar_coupons.query('(RestaurantLessThan20 in ["4~8", "gt8"]) and (income in ["Less than $12500", "$12500 - $24999", "$25000 - $37499", "$37500 - $49999"])')
the_rest_3 = df_bar_coupons.query('(RestaurantLessThan20 not in ["4~8", "gt8"]) or (income not in ["Less than $12500", "$12500 - $24999", "$25000 - $37499", "$37500 - $49999"])')
print(f'Bar promo acceptance rate for filter criteria 3:'
      f'\n  * Went to Cheap Restaurants More than 4 Times in a Month'
      f'\n  * Income Less than 50K\n\n\tAcceptance rate (for the group above): {"%.3f"% get_acceptance_rate(filter_criteria_3)}\n'
      f'\tAcceptance rate for all others: {"%.3f"% get_acceptance_rate(the_rest_3)}\n')

x = ['group 1' , 'group 1<br>all the rest', 'group 2' , 'group 2<br>all the rest', 'group 3' , 'group 3<br>all the rest']
y = [get_acceptance_rate(filter_criteria_1), get_acceptance_rate(the_rest_1), 
     get_acceptance_rate(filter_criteria_2), get_acceptance_rate(the_rest_2), 
     get_acceptance_rate(filter_criteria_3), get_acceptance_rate(the_rest_3)]
plot_data = { 'x': x, 'y': y, 'Groups': 
    ['Go to bars more than once a month AND<br>Had passengers that were not a kid AND<br>Not Widowed',
    'Go to bars more than once a month AND<br>Had passengers that were not a kid AND<br>Not Widowed', 
    'Go to bars more than once a month AND<br>Age under 30', 
    'Go to bars more than once a month AND<br>Age under 30', 
    'Went to cheap restaurants > 4 times<br>in a month AND<br>Income less than 50K', 
    'Went to cheap restaurants > 4 times<br>in a month AND<br>Income less than 50K'] }

df_tmp = pd.DataFrame(plot_data)

y_3f = ['%.3f'%x for x in y]
generate_fig_plotly(df=df_tmp, x='x', y=plot_data.get('y'), color = 'Groups', w=800, h=600, plot_line=True, plot_line_value=0.41, plot_line_text='aggregated bar coupons<br>acceptance rate = 0.41')

Bar promo acceptance rate for filter criteria 1:
  * Went to bars more than once a month
  * Had a paaganer that was not kid
  * Had marital status other than widowed

	Acceptance rate (for the group above): 0.718
	Acceptance rate for all others: 0.377

Bar promo acceptance rate for filter criteria 2:
  * Went to bars more than once a month
  * Age under 30

	Acceptance rate (for the group above): 0.722
	Acceptance rate for all others: 0.346

Bar promo acceptance rate for filter criteria 3:
  * Went to Cheap Restaurants More than 4 Times in a Month
  * Income Less than 50K

	Acceptance rate (for the group above): 0.453
	Acceptance rate for all others: 0.401



<span style='font-size:20px;'>&#128161;</span> Drivers that went to bars more than once a month and had adult passengers and were not widowed accepted bar coupons more frequently (`71.9%`) than the rest of the drivers (`37.7%`). Drivers that went to bars more than once a month and were younger than 30 years accepted bar coupons more frequently (`72.1%`) than the rest of the drivers (`34.5%`). Drivers that went to cheap restaurants less than 4 times a month and had income below $50K accepted bar coupons slightly more frequently (`45.3%`) than the rest of the drivers (`40.1%`). It seems like the later group didn't deviate much from the aggregated acceptance rate for bar coupons (`41`%), compared to the first two groups that showed higher rate acceptency (`71.9%` and `72.1%`)

**Bar coupon's histogram subplots**
<br>Before movinge to findings and summary of the `Bar` coupons, I decided to add histogram subplot chart, so I'll have the distributaion of each variable, in order to understand certain categories impact on the final findings. 

In [624]:
columns = ['destination', 'passenger', 'weather', 'temperature', 'time', 'coupon', 'expiration', 'gender', 'age', 'has_children','Bar', 'CoffeeHouse', 'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50', 'toCoupon_GEQ5min', 'toCoupon_GEQ15min', 'toCoupon_GEQ25min', 'direction_same', 'direction_opp', 'Y', 'occupation', 'maritalStatus', 'income', 'education']
dimension = math.ceil(np.sqrt(len(columns)))
fig = make_subplots(cols = dimension, rows = dimension, subplot_titles=(list(map(lambda x: x.replace('_', " ").title(), columns))))

df_accepted = df_bar_coupons.query('Y == 1')
df_rejected = df_bar_coupons.query('Y == 0')

for i, column in enumerate(columns):
    row = math.floor((i + dimension) / dimension)
    col = ((i + dimension) % dimension) + 1
    fig.add_histogram(x = df_accepted[column], row = row, col = col, name='Accepted', marker_color='teal', showlegend=(i==0))
    fig.add_histogram(x = df_rejected[column], row = row, col = col, name='Didn\'t Accept', marker_color='red', showlegend=(i==0))
    fig['layout'][f'yaxis{i+1}']['title'] = 'Count'
    fig['layout'][f'xaxis{i+1}']['title'] = column.replace('_', " ").title()
    fig['layout'][f'yaxis{i+1}']['title']['standoff'] = 0
    fig['layout'][f'xaxis{i+1}']['title']['standoff'] = 0
    fig.update_traces(marker_line_width=1,marker_line_color="white")
    
    
fig.update_layout(height=1500, width=1700, title_text="Variables Histograms for Bar coupons")
fig.update_xaxes(type='category')
fig.show()

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

### Findings

Coupons were:
- most offered to `Coffee House` goers
- overall coupons acceptance rate was (`56.9%`)
- overall coupons rejection rate was (`43.1%`)
- most offered on warmer days (`80F°`) 
- when offered on warmer temperature had higher acceptance rate (`60%`), compared to oupons that were offered       at colder days (`55F°`,`30F°`) that produces acceptance rate of (`53.6%`, `53.1%`), respectively
- when offered on warmer weather (`80F°`) had higher acceptance rate (`60%`), compared to the aggregated acceptance 
  rate (`56.9%`)

`Bar` coupons were:
- offered the second least out of 5 different coupon categories
- most offered on colder days days (`55F°` or `30F°`), that were most sunny 
- accepted at rate (`41.2%`) less than the overall coupuns aggregate acceptance rate (`56.9%`)
- rejection rate (`58.8%`) was higher than the overall coupuns aggregate rejection rate (`43.1%`)
- accepted at higher rate (`76.8%`) by drivers who went to bars more frequently ( more than 3 times a month), compared to drivers that went less frequently to bars, who presented acceptance rate of (`37%`)
- accepted at higher rate (`68.7%`) by drivers who went to bars more once a month, compared to drivers that went  to bars less than 1 time and had acceptance rate of (`33.4%`)
- accepted at higher rate (`69.5%`) by drivers who went to bars more than once a month and were older than 25 compared to the rest of the drivers that were offered bar coupons (`33.5%`)
- rejection rate was at higher rate (`66.5%`) for drivers who went to bars less than once a month or were older than 25, compared to the aggregated rejection rate of (`58.8%`)
- accepted at higher rate (`71.8%`) by drivers who went to bars more than once a month and had adult passengers and were not working in farming/fishing/forestry, than the rest of the drivers that were offered bar coupons (`37.7%`)
- accepted at higher rate (`71.9%`) by drivers who went to bars more than once a month and had adult passengers and were not widowed, compared the rest of the drivers (`37.7%`)
- when looking on the above 2 findings, we notice a very similar acceptance rate. That can be explained by looking at histograms and find that the number of drivers who were widowed or working in farming/fishing/forestry was so little, that when taking the group that doesn't include them, we received almost the whole `Bar` coupon drivers. That led to similarity in acceptance rate of (`71.8%`) and (`71.9%`) when the only difference between the above groups were not including widowed or not including drivers who work in farming/fishing/forestry.
- accepted at higher rate (`72.1%`) by drivers who went to bars more than once a month and were younger than 30  years, than the rest of the drivers (`34.5%`)
- accepted at a rate (`45.3%`) slightly higher than the aggregated rate for all bar coupons (`41%`) for drivers who went frequently (> 4 a month)  to cheap restaurants and had income of less than `$50K`, while the rest of the drivers showed a similar acceptance rate (`40.1%`) to the `Bar` aggregated rate (`41%`)
- It seems like the later group didn't deviate much from the aggregated acceptance rate for `Bar` coupons (`41%`), compared to the first two groups that showed higher rate acceptency (`71.9% and 72.1%`)
  



### Summary
Drivers were more likely (`60%`) to accept coupons on warmer weather (`80F°`), compared to colder weather and compared to the overall aggregated acceptance rate (`56.9%`).

Drivers that were frequent bar-goers (more than 3 times a month) were more likely (`76.8%`) to accept `Bar` coupons, while drivers that were less frequent bar-goers showed a higher `Bar` coupons rejection rate (`63%`).

Drivers that were bar-goers more than once a month still had higher `Bar` coupons acceptance rate (`68.7%`), but it was slighltly less compared to the acceptance rate (`76.8%`) of the more frequent bar-goers.

Consider offering more coupons to drivers that went more frequently to bars, since `Bar` coupons were more likely to be accepted by them.

For drivers that went to bars more than once a month, it seems that the split of age of the driver (to older and younger than 25) was not an indicaitive factor of the acceptance rate by itself. Looking at drivers older than 25 that went to bars once a month, produced a similar acceptance rate (`69.5%`) of `Bar` coupons to drivers that just went to bars once a month (`68.7%`).

Consider offering less `Bar` coupons to drivers who went to bars less than once a month or were older than 25, since the rejection rate of this group was at higher rate (`66.5%`), compared to the aggregated rejection rate of (`58.8%`)

Drivers that went to bars more than once a month and were younger than 30 years, were more likely (`72.1%`) to accept `Bar` coupon, than drivers that just went to a bar once a month (`68.7%`). When looking at the age histograms we can confirm that drivers above 30 are less likely to accept `Bar` coupons, compared to those below 30. The conclusion we can draw is  that offer `Bar` coupons to younger drivers will more likely produce higher acceptance rates.

The sample size of drivers who are widowed (21) or work in farming/fishing/forestry (9) was so little compared to the number of `Bar` coupon drivers (2017), that when trying to exclude them from comparisons, there was no affect on the acceptance rate. Also, because of the small sample size it'll be very hard to draw conclusions on these particular groups.

Drivers who went more frequent (> 4 a month) to cheap restaurants and had income of less than $50K, had similar acceptance rate to the aggregate acceptance rate of `Bar` coupons. These categories when combined together are not a good indication of acceptance rate for `Bar`.


### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons. 

In order to make that decision I wanted to look into coupons acceptance rate overall and per coupon group. 

In [607]:
accepted_rate = get_acceptance_rate(data)
print(f'proportion of all coupons:\n\tAccepted:     {accepted_rate}\n\tNot accepted: {1 - accepted_rate}')
generate_fig_plotly(x=('Accepted', 'Not Accepted'), y=(accepted_rate, 1 - accepted_rate), title='All Coupons Acceptance Rate')

proportion of all coupons:
	Accepted:     0.5684326710816777
	Not accepted: 0.4315673289183223


In [608]:
result = get_acceptance_dataframe(data, 'coupon')
fig = px.bar(result, x=result.index, y=['Accepted',  'Not Accepted'])
fig.update_layout(title='Coupons Acceptance Rate per Group', title_x = 0.5, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis_title = 'Coupon Groups').update_yaxes(range = [0, 1]).show()

### Investigating the Coffee House Coupons

Creating a new `DataFrame` that contains just the `Coffee House` coupons.

In [609]:
df_coffee_house_coupons = data.query('coupon == "Coffee House"')

Looking into the aggregated acceptance rate of `Coffee House` coupons. We'll need this data as a base to draw conclusions and insights later.

In [610]:
accepted_rate = get_acceptance_rate(df_coffee_house_coupons)
print(f'proportion of coffee coupons:\n\tAccepted: {accepted_rate}\n\tNot accepted: {1 - accepted_rate}')

generate_fig_plotly(x=('Accepted', 'Not Accepted'), y=(accepted_rate, 1 - accepted_rate), title='Coffee House coupons Acceptance Rate')

proportion of coffee coupons:
	Accepted: 0.49924924924924924
	Not accepted: 0.5007507507507507


<span style='font-size:20px;'>&#128161;</span>W`Coffee House` coupons had about the same acceptance and rejection rates at `49.9%` and `50.1%`, respectively.

**Histograms and Acceptance Ratio Bar Plots**

In order to further investigate `Coffee House` coupons we'll look into the different the histograms, that are broken down into accepted/not accepted, and the acceptance rate of all variables. Following that, we'll focus and dive into certain variables in the data set that pop out as candidates of inetrest.  

In [625]:
columns = ['destination', 'passenger', 'weather', 'temperature', 'time', 'coupon', 'expiration', 'gender', 'age', 'has_children','Bar', 'CoffeeHouse', 'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50', 'toCoupon_GEQ5min', 'toCoupon_GEQ15min', 'toCoupon_GEQ25min', 'direction_same', 'direction_opp', 'Y', 'occupation', 'maritalStatus', 'income', 'education']
dimension = math.ceil(np.sqrt(len(columns)))
fig = make_subplots(cols = dimension, rows = dimension, subplot_titles=(list(map(lambda x: x.replace('_', " ").title(), columns))))

df_accepted = df_coffee_house_coupons.query('Y == 1')
df_rejected = df_coffee_house_coupons.query('Y == 0')

for i, column in enumerate(columns):
    row = math.floor((i + dimension) / dimension)
    col = ((i + dimension) % dimension) + 1
    fig.add_histogram(x = df_accepted[column], row = row, col = col, name='Accepted', marker_color='teal', showlegend=(i==0))
    fig.add_histogram(x = df_rejected[column], row = row, col = col, name='Didn\'t Accept', marker_color='red', showlegend=(i==0))
    fig['layout'][f'yaxis{i+1}']['title'] = 'Count'
    fig['layout'][f'xaxis{i+1}']['title'] = column.replace('_', " ").title()
    fig['layout'][f'yaxis{i+1}']['title']['standoff'] = 0
    fig['layout'][f'xaxis{i+1}']['title']['standoff'] = 0
    fig.update_traces(marker_line_width=1,marker_line_color="white")
    
    
fig.update_layout(height=1500, width=1500, title_text="Variables Histograms for Coffee House coupons")
fig.update_xaxes(type='category')
fig.show()

Similar to the histograms above, we'll generate bar plots for all variables, with a split to acceptance and rejection rates.

I added a horizontal line that represents the aggregated acceptance ratio for `Coffee House` coupons (p=0.5, `50%`), so we can easily find intereting trends in certain categorical variables.

In [612]:
import math
from plotly.subplots import make_subplots
columns = ('passenger', 'weather', 'age', 'time', 'temperature', 'destination', 'expiration', 'gender', 'maritalStatus', 'CoffeeHouse', 'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50', 'occupation', 'income', 'education')

dimension = math.ceil(np.sqrt(len(columns)))
fig = make_subplots(cols = dimension, rows = dimension, subplot_titles=(list(map(lambda x: x.replace('_', " ").title(), columns))))

for i, column in enumerate(columns):
    row = math.floor((i + dimension) / dimension)
    col = ((i + dimension) % dimension) + 1
    df_tmp = get_acceptance_dataframe(df_coffee_house_coupons, column)
    fig.add_trace(go.Bar(x=df_tmp.index, y=df_tmp['Accepted'], marker_color='teal', showlegend=(i==0), name='Accepted'), row = row, col = col)
    fig.add_trace(go.Bar(x=df_tmp.index, y=df_tmp['Not Accepted'],  marker_color='red', showlegend=(i==0), name='Didn\'t Accept'), row = row, col = col)
    fig['layout'][f'yaxis{i+1}']['title'] = 'Acceptance Rate'
    fig['layout'][f'xaxis{i+1}']['title'] = column.replace('_', " ").title()
    fig['layout'][f'yaxis{i+1}']['title']['standoff'] = 0
    fig['layout'][f'xaxis{i+1}']['title']['standoff'] = 0
    fig.update_yaxes(range = [0, 0.8])
    fig.add_hline(y=0.5, line_width=1, line_dash="dash", line_color="black",  annotation_text="0.5", 
                  annotation_position="right").update_xaxes(type='category')

fig.update_layout(height=1500, width=1500, title_text="Acceptance Rate Bar Charts")

fig.show()

Another way of presenting the above ratio data (with full plots vs sub plots)

In [613]:
for category in ('passenger', 'weather', 'age', 'time', 'temperature', 'expiration', 'gender', 'maritalStatus', 'CoffeeHouse', 'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50', 'income', 'occupation', 'education'):
    df_tmp = get_acceptance_dataframe(df_coffee_house_coupons, category)
    fig = px.bar(df_tmp, x=df_tmp.index, y=['Accepted',  'Not Accepted'])
    fig.update_layout(title=f'Coffee House Promo Acceptance Rate vs {category.title()}', title_x = 0.5, title_y = 0.95,
                  width=700, height=500,
                  yaxis_title = 'Acceptance Rate', 
                  xaxis={'categoryorder':'sum ascending'},
                  xaxis_title = f'{category.title()} Categories')
    fig.add_hline(y=0.5, line_width=3, line_dash="dash", line_color="black",  annotation_text="Coffee House acceptance<br>rate baseline (0.5)", 
                  annotation_position="right").update_xaxes(type='category')
    fig.show()

**Temprature affect on acceptance rate**

Lets compare the acceptance rate between drivers who drove in different weather conditions.

We'll start with looking at the totals and then at the acceptance break down.

In [614]:
fig = px.histogram(df_coffee_house_coupons, x='temperature', color='Y', labels={"Y": "Accepted Promo"},
                  color_discrete_sequence=["red", "blue"])
fig.update_layout(title='Temperature (F°) vs Accepted Promo', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Temperature (F°)',
                   width=600, height=600,
                   showlegend=True,
                   xaxis={'categoryorder':'total ascending'},
                   barmode='stack')
fig.update_xaxes(type='category')
fig.show()

<span style='font-size:20px;'>&#128161;</span>We can see from the bar plot above that 80F was the most popular temerature for drivers that were offered `Coffee House` coupons. We can also see that warmer temperatures led to a better acceptance totals - 1272 accepted vs 1129 rejected, while in colder weather (30F° and 55F°) we noticed more coupon rejections than acceptance.

With that in mind, lets build a bar plot that will compare the coupon acceptance ratios among the different temperatues.

In [615]:
coffee_temp_acceptance_frame = get_acceptance_dataframe(df_coffee_house_coupons, 'temperature')

result = pd.DataFrame(coffee_temp_acceptance_frame)
show_order = ['never','less1','1~3', 'gt8','4~8']
fig = px.bar(result, x=result.index, y=['Accepted',  'Not Accepted'])
fig.update_layout(title='Coffee House Promo Acceptance Rate vs Visit Frequency', title_x = 0.5, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis={'categoryorder':'sum descending'},
              xaxis_title = 'Coffee House Visit Frequency (per month)')
fig.add_hline(y=0.5, line_width=3, line_dash="dash", line_color="black",  annotation_text="Coffee House acceptance<br>rate baseline (0.5)", 
              annotation_position="right")
fig.update_yaxes(range = [0, 1]).update_xaxes(categoryorder='array', categoryarray=show_order).update_xaxes(type='category').show()

<span style='font-size:20px;'>&#128161;</span>We see that we can confirm our observations from earlier.
Warmer weather (80F°) led to higher Coffe House coupons acceptence rate of (`52%`) - even compared to the aggragated (`50%`) rate, while colder weather temperatures of 30F° and 55F° saw higher rejection rates (`55.6%`,  `54.4%`), respectively.

**Coffee House Visit Frequency affect on Acceptance Rate**

Lets compare the acceptance rate of `Coffee House` coupons between drivers that go to `Coffee House` less and more frequent (per month).


In [653]:
show_order = ['never','less1','1~3', '4~8', 'gt8']
fig = px.histogram(df_coffee_house_coupons, x='CoffeeHouse')
fig.update_layout(title='Coffee House visit Frequency Histogram', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Coffee House Visit Frequency (per month)', 
                   width=630, height=600,
                   showlegend=True,
                   xaxis={'categoryorder':'total ascending'},
                   barmode='group') 
fig.update_traces(marker_color='blue').update_xaxes(type='category').update_xaxes(categoryorder='array', categoryarray=show_order).show()

coffee_house_frequency_frame = get_acceptance_dataframe(df_coffee_house_coupons, 'CoffeeHouse')

fig = px.bar(coffee_house_frequency_frame, x=coffee_house_frequency_frame.index, y=['Accepted',  'Not Accepted'])
fig.update_layout(title='Coffee House Promo Acceptance Rate vs Visit Frequency', title_x = 0.5, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis={'categoryorder':'sum descending'},
              xaxis_title = 'Coffee House Visit Frequency (per month)')
fig.add_hline(y=0.5, line_width=3, line_dash="dash", line_color="black",  annotation_text="Coffee House acceptance<br>rate baseline (0.5)", 
              annotation_position="right")
fig.update_yaxes(range = [0, 1]).update_xaxes(categoryorder='array', categoryarray=show_order).show()

more_than_once_a_month = df_coffee_house_coupons.query('CoffeeHouse in ("4~8", "1~3", "gt8")')
coffee_house_accepted_more_than_1_visit = len(df_coffee_house_coupons.query('CoffeeHouse in ("4~8", "1~3", "gt8")').query('Y == 1')) / len(df_coffee_house_coupons.query('CoffeeHouse in ("4~8", "1~3", "gt8")')) 
coffee_house_accepted_less_than_1_visit = len(df_bar_coupons.query('CoffeeHouse not in ("4~8", "1~3", "gt8")').query('Y == 1')) / len(df_bar_coupons.query('CoffeeHouse not in ("4~8", "1~3", "gt8")'))

generate_fig_plotly(x=('Went to Coffee House >= Once a Month', 'Went to Coffee House < Once a Month'), 
             y=(coffee_house_accepted_more_than_1_visit, coffee_house_accepted_less_than_1_visit), 
             title='Coffee House Acceptance Rate vs Visit Frequency (Aggregated)',
             plot_line=True, plot_line_value=0.499, plot_line_text='<br>aggregated<br>rate (0.499)')

more_than_4_a_month = df_coffee_house_coupons.query('CoffeeHouse in ("4~8", "gt8")')
less_than_4_a_month = df_coffee_house_coupons.query('CoffeeHouse not in ("4~8", "gt8")')
coffee_house_accepted_more_than_4_visit = len(more_than_4_a_month.query('Y == 1')) / len(more_than_4_a_month)
coffee_house_accepted_less_than_4_visit = len(less_than_4_a_month.query('Y == 1')) / len(less_than_4_a_month)

generate_fig_plotly(x=('Went to Coffee House<br>>= 4 times a Month', 'Went to Coffee House<br>< 4 times a Month'), 
             y=(coffee_house_accepted_more_than_4_visit, coffee_house_accepted_less_than_4_visit), 
             title='Coffee House Acceptance Rate vs Visit Frequency (Aggregated)',
             plot_line=True, plot_line_value=0.499, plot_line_text='<br>aggregated<br>rate (0.499)')


<span style='font-size:20px;'>&#128161;</span> Coffee House coupons were offered less to drivers that went to coffee houses 4 or more times a month. We can learn from the acceptance ratio bar that drivers who went to coffee houses more than once a month, had a higher acceptance rate (`66%`), with compared to the aggregated acceptance rate (`50%`).

**Coffee House Driver's Geo Location affect on acceptance rate**

Lets compare the acceptance rate of `Coffee House` coupons among drivers with different geographical location  compared to the destination coupon (in driving minutes). The tracked values were more than 5, more than 15 and more than 25 minutes driving time to the destination, where the coupon was offered.

In [657]:
dict_frame = {'To Coupon Time': ['toCoupon_GEQ5min', 'toCoupon_GEQ15min', 'toCoupon_GEQ25min'], 'Accepted': [], 'Not Accepted': []}

print(f'\nTotal drivers that were offered Coffee House coupon: {len(df_coffee_house_coupons)}\n')
print(f'Total drivers that were offered Coffee House coupon per Geo location:')
drivers_more_than_5_min_from_dest = 0
for frequency in ('toCoupon_GEQ5min', 'toCoupon_GEQ15min', 'toCoupon_GEQ25min'):
    df_coffee_house_coupons_GEQ = data.query(f'(coupon == "Coffee House") and ({frequency} == 1)')
    all_df = df_coffee_house_coupons_GEQ.groupby(frequency).agg('count')['Y']
    print(all_df.index.name, all_df.iloc[0])
    drivers_more_than_5_min_from_dest += all_df.iloc[0]
    accepted =  df_coffee_house_coupons_GEQ.query('Y == 1').groupby(frequency).agg('count')['Y']
    ratio_accepted = accepted / all_df
    dict_frame['Accepted'].append(ratio_accepted.iloc[0])
    ratio_not_accepted = 1 - accepted / all_df
    dict_frame['Not Accepted'].append(ratio_not_accepted.iloc[0])

print(f'Precentage drivers that were offered Coffee House coupon that under 15 minutes from the destination: {"%.3f"%(100 * (1  - ((2073 + 310) / len(df_coffee_house_coupons) )))}%')    
result = pd.DataFrame.from_dict(dict_frame)
fig = px.bar(result, x='To Coupon Time', y=['Accepted',  'Not Accepted'])
fig.update_layout(title='Coffee House Promo Acceptance Rate vs Driving Time to Coupon (minutes)', title_x = 0.5, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis={'categoryorder':'sum descending'},
              xaxis_title = 'Driving Time to Coupon (minutes)')
fig.add_hline(y=0.5, line_width=3, line_dash="dash", line_color="black",  annotation_text="Coffee House acceptance<br>rate baseline (0.5)", 
                            annotation_position="right")
fig.update_yaxes(range = [0, 1]).update_xaxes(categoryorder='array', categoryarray=show_order).show()


Total drivers that were offered Coffee House coupon: 3996

Total drivers that were offered Coffee House coupon per Geo location:
toCoupon_GEQ5min 3996
toCoupon_GEQ15min 2073
toCoupon_GEQ25min 310
Precentage drivers that were offered Coffee House coupon that under 15 minutes from the destination: 40.365%


<span style='font-size:20px;'>&#128161;</span> Drivers were at least 5 minutes from the `Coffee House` destination (3996) - which led to the same acceptance rate (`50%`) to both coupons that were 5 or more minutes from the destination and the aggregated `Coffee House` acceptance rate.
We can also notice that there's a negative correlation between the distance from the distination in minutes, and the acceptance rate. The longer the distance, the less acceptance rate of `Coffee House` coupons. It also apparent that drivers that were 15 or 25 minutes away from the coffee house coupon, rejected the coupon in a higher rates (`54.6%`, `65.4`), respectively, compared to the `Coffee House` aggregated coupon rejection rate (`50%`) and also to drivers that were 5 minutes away from the destination.

**Coffee House Driver's age affect on acceptance rate**


In [648]:
show_order = ['below21','21','26', '31','36', '41', '46', '50plus']
fig = px.histogram(df_coffee_house_coupons, x='age')
fig.update_layout(title='Driver\'s Age Histogram', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Driver\'s Age (years)', 
                   width=650, height=600,
                   showlegend=True,
                    xaxis={'categoryorder':'total ascending'},
                    barmode='group') 
fig.update_traces(marker_color='blue').update_xaxes(type='category').update_xaxes(categoryorder='array', categoryarray=show_order).show()

df_accept_age = get_acceptance_dataframe(df_coffee_house_coupons, 'age')

fig = px.bar(df_accept_age, x=df_accept_age.index, y=['Accepted',  'Not Accepted'])
fig.update_layout(title='Coffee House Promo Acceptance Rate vs Driver\'s Age', title_x = 0.5, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis={'categoryorder':'sum descending'},
              xaxis_title = 'Driver\'s Age (years)')
fig.add_hline(y=0.5, line_width=3, line_dash="dash", line_color="black",  annotation_text="Coffee House acceptance<br>rate baseline (0.5)", 
              annotation_position="right")
fig.update_yaxes(range = [0, 1]).update_xaxes(categoryorder='array', categoryarray=show_order).show()

<span style='font-size:20px;'>&#128161;</span> Younger drivers below the age of 21 accepted most frequently the `Coffee House` coupon at (`69.6%`) rate. We can also see that they were the age group that was offered the least (`3.8%`) `Coffee House` coupons. We can also observe that older drivers, over 50 years old, rejected the `Coffee House` coupon in a higher rate (`57.9%`) than all other age groups and with comparison to the basline aggregated rejection rate (`50%`) for `Coffee House`.

**Coupon expiration period affect on acceptance rate**

In [619]:
fig = px.histogram(df_coffee_house_coupons, x='expiration')
fig.update_layout(title='Coupon Expiration Histogram', title_x=0.5, title_y=0.95,
                   yaxis_title='Count', 
                   xaxis_title='Expiration', 
                   width=630, height=500,
                   showlegend=True,
                    xaxis={'categoryorder':'total ascending'},
                    barmode='group') 
fig.update_traces(marker_color='blue').update_xaxes(type='category').update_xaxes(categoryorder='array', categoryarray=show_order).show()

df_accept_age = get_acceptance_dataframe(df_coffee_house_coupons, 'expiration')

fig = px.bar(df_accept_age, x=df_accept_age.index, y=['Accepted',  'Not Accepted'])
fig.update_layout(title='Coffee House Promo Acceptance Rate vs Expiration', title_x = 0.5, title_y = 0.95,
              width=700, height=500,
              yaxis_title = 'Acceptance Rate', 
              xaxis={'categoryorder':'sum descending'},
              xaxis_title = 'Expiration')
fig.add_hline(y=0.5, line_width=3, line_dash="dash", line_color="black",  annotation_text="Coffee House acceptance<br>rate baseline (0.5)", 
              annotation_position="right")
fig.update_yaxes(range = [0, 1]).update_xaxes(categoryorder='array', categoryarray=show_order).show()

<span style='font-size:20px;'>&#128161;</span> `Coffee House` coupon with longer `expiration` period of 1 day were offered more (2227 vs 1769 times), and had higher acceptance rate (`58.3%`), compared to coupons that were offered with expiration of only 2 hours (`43.1%`). 

**Cheap restaurants and low income affect on acceptance rate**


In [665]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=df_coffee_house_coupons.RestaurantLessThan20))
fig.add_trace(go.Histogram(x=df_coffee_house_coupons.income))
fig.update_layout(title='Restaurants < 20 and Income Histograms', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Restaurant < 20                                                                                                     Income (USD$)',
                   xaxis_automargin = True,
                   xaxis = dict(
                         tickmode = 'array',
                         tickvals = ['1~3', 'less1', 'never', 'gt8', '4~8', '$37500 - $49999', '$62500 - $74999', '$12500 - $24999', '$75000 - $87499',
                                     '$50000 - $62499', '$25000 - $37499', '$100000 or More', '$87500 - $99999', 'Less than $12500'],
                         ticktext = ['1~3', 'less1', 'never', 'gt8', '4~8', '37.5K-49.9K', '62.5K-$74.9K', '12.5K-24.9K', '7.5K-87.49K',
                                     '50K-62.49K', '25K-37.49K', '100K or More', '87.5K-99.99K', 'Less than 12.5K']))
fig.show()

filter_criteria = df_coffee_house_coupons.query('(RestaurantLessThan20 in ["gt8"]) and (income in ["Less than $12500", "$12500 - $24999"])')
print('Precentage of Cheap Restaurants > 8 AND Income < 25K:', f'{"%.3f"%(100 * len(filter_criteria) / len(df_coffee_house_coupons))}%\n')
the_rest = df_coffee_house_coupons.query('(RestaurantLessThan20 not in ["gt8"]) or (income not in ["Less than $12500", "$12500 - $24999"])')
print(f'Coffee House promo acceptance rate for filter criteria:'
      f'\n  * Went to cheap restaurants more than 8 times a month'
      f'\n  * Income less than 25K\n\n\tAcceptance rate (for the group above): {"%.3f"% get_acceptance_rate(filter_criteria)}\n'
      f'\tAcceptance rate for all others:        {"%.3f"% get_acceptance_rate(the_rest)}\n')

generate_fig_plotly(['Cheap Restaurants > 8 AND<br> Income < 25K', 'Cheap Restaurants <= 8 OR<br> Income >= 25K'],
               [get_acceptance_rate(filter_criteria), get_acceptance_rate(the_rest)], plot_line=True, plot_line_value=0.5,
                   title='Coffee House Acceptance Rate Analysis By Groups')

Precentage of Cheap Restaurants > 8 AND Income < 25K: 1.977%

Coffee House promo acceptance rate for filter criteria:
  * Went to cheap restaurants more than 8 times a month
  * Income less than 25K

	Acceptance rate (for the group above): 0.709
	Acceptance rate for all others:        0.495



<span style='font-size:20px;'>&#128161;</span> `Coffee House` coupons that were offered to drivers that ate more than 8 times a month at cheap restaurants and had lower income (less than $25K) had higher acceptance rate (`70.8%`), compared to coupons that were offered to others (`49.5%`). 

**Cheap restaurants and low income affect on acceptance rate**

In [621]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=df_coffee_house_coupons.Restaurant20To50))
fig.add_trace(go.Histogram(x=df_coffee_house_coupons.income))
fig.update_layout(title='Restaurants 20 to 50 and Income Histograms', title_x=0.5, title_y=0.95, 
                   yaxis_title='Count', 
                   xaxis_title='Restaurant 20 to 50                                                                           Income (USD$)',
                   xaxis_automargin = True,
                   xaxis = dict(
                         tickmode = 'array',
                         tickvals = ['1~3', 'less1', 'never', 'gt8', '4~8', '$37500 - $49999', '$62500 - $74999', '$12500 - $24999', '$75000 - $87499',
                                     '$50000 - $62499', '$25000 - $37499', '$100000 or More', '$87500 - $99999', 'Less than $12500'],
                         ticktext = ['1~3', 'less1', 'never', 'gt8', '4~8', '37.5K-49.9K', '62.5K-74.9K', '12.5K-24.9K', '7.5K-87.49K',
                                     '50K-62.49K', '25K-37.49K', '100K or More', '87.5K-99.99K', 'Less than 12.5K'],
                         )
                  )
fig.show()
filter_criteria = df_coffee_house_coupons.query('(Restaurant20To50 in ["gt8", "4~8"]) and (income in ["$50000 - $62499", "$62500 - $74999", "$75000 - $87499", "$87500 - $99999", "$100000 or More"])')
the_rest = df_coffee_house_coupons.query('(Restaurant20To50 not in ["gt8", "4~8"]) or (income not in ["$50000 - $62499", "$62500 - $74999", "$75000 - $87499", "$87500 - $99999", "$100000 or More"])')
print(f'Coffee House promo acceptance rate for filter criteria:'
      f'\n  * Went to more expensive restaurants more than 4 times a month'
      f'\n  * Income 50K or more\n\n\tAcceptance rate (for the group above): {"%.3f"% get_acceptance_rate(filter_criteria)}\n'
      f'\tAcceptance rate for all others:        {"%.3f"% get_acceptance_rate(the_rest)}\n')

generate_fig_plotly(['More Expensive Restaurants >= 4 AND<br> Income >= $50K', 'More Expensive Restaurants < 4 OR<br> Income < 50K'],
               [get_acceptance_rate(filter_criteria), get_acceptance_rate(the_rest)], plot_line=True, plot_line_value=0.5, plot_line_text='coffe house<br>aggregated<br>acceptance<br>= 0.5',
                   title='Coffee House Acceptance Rate Analysis By Groups')

Coffee House promo acceptance rate for filter criteria:
  * Went to more expensive restaurants more than 4 times a month
  * Income 50K or more

	Acceptance rate (for the group above): 0.513
	Acceptance rate for all others:        0.499



<span style='font-size:20px;'>&#128161;</span> Drivers that went more than 4 times a month to more expensive restaurants and had higher income ($50K and more), accepted `Coffee House` coupons at slightly higher acceptance rate (`51.2%`), compared to coupons that were offered to others (`49.8%`). 

**Drivers older than 25 that visited coffee houses more than once a month affect on acceptance rate**

In [622]:
above_25_and_more_than_1_bar_visit_acceptance = get_acceptance_rate(df_coffee_house_coupons.query('(CoffeeHouse in ("1~3", "4~8", "gt8")) and (age not in ("21","below21"))'))
the_rest_acceptance = get_acceptance_rate(df_coffee_house_coupons.query('(CoffeeHouse not in ("1~3", "4~8", "gt8")) or (age in ("21","below21"))'))

# below_21_and_more_than_1_bar_visit_acceptance = get_acceptance_rate(df_coffee_house_coupons.query('(CoffeeHouse in ("1~3", "4~8", "gt8")) and (age in ("21","below21"))'))
print(below_21_and_more_than_1_bar_visit_acceptance)
print(f'Coffee House promo acceptance rate of those who went to coffee houses more than once a month and their age older than 25:\n\tAcceptance Rate: ' +
      f'{"%.3f"% above_25_and_more_than_1_bar_visit_acceptance}\n\tAll others:      {"%.3f"% the_rest_acceptance}\n' +
      '\nWe can definitely see that Coffee House coupons were accepted at higher rate by older drivers (over 25) that went to the coffee houses more than once a month.')

generate_fig_plotly(['Age > 25 AND<br>Went to Coffee House >= Once a Month', 'Age < 25 OR<br>Went to Coffee House < Once a Month'],
               [above_25_and_more_than_1_bar_visit_acceptance, the_rest_acceptance], plot_line=True, plot_line_value=0.5,
                    plot_line_text='coffe house<br>aggregated<br>acceptance<br>= 0.5', title='Coffee House Acceptance Rate Analysis By Groups')

0.6704545454545454
Coffee House promo acceptance rate of those who went to coffee houses more than once a month and their age older than 25:
	Acceptance Rate: 0.638
	All others:      0.428

We can definitely see that Coffee House coupons were accepted at higher rate by older drivers (over 25) that went to the coffee houses more than once a month.


<span style='font-size:20px;'>&#128161;</span> Drivers accepted Coffee House coupons at a significantly higher rate (`%63.8`) when they were older than 25 and went to coffee houses more than once a month, compared to all others acceptance rate (`%42.8`).

**Drivers with no kids that work as healthcare practitioners or building/cleaning/maintenance jobs affect on acceptance rate**

In [644]:
criteria_acceptance = get_acceptance_rate(df_coffee_house_coupons.query('(occupation in ("Healthcare Practitioners & Technical", "Building & Grounds Cleaning & Maintenance")) and (has_children == 0)'))
the_rest_acceptance = get_acceptance_rate(df_coffee_house_coupons.query('(occupation not in ("Healthcare Practitioners & Technical", "Building & Grounds Cleaning & Maintenance", "Healthcare Support")) or (has_children != 0)'))
print('Count of drivers with no kids that work as healthcare practitioners or building/cleaning/maintenance jobs:', len(df_coffee_house_coupons.query('(occupation in ("Healthcare Practitioners & Technical", "Building & Grounds Cleaning & Maintenance")) and (has_children == 0)')))
print(f'Coffee House promo acceptance rate of those with no kids and work as healthcare practitioners or building, grounds cleaning and maintenance jobs":\n\tAcceptance Rate: ' +
      f'{"%.3f"% criteria_acceptance}\n\tAll others:      {"%.3f"% the_rest_acceptance}\n')

generate_fig_plotly(['Don\'t have kids AND occupation is<br>"healthcare practitioners or building" or<br>"grounds cleaning and maintenance"', 
                     'Have kids OR occupation is NOT<br>"healthcare practitioners or building" or<br>"grounds cleaning and maintenance"'],
                     [criteria_acceptance, the_rest_acceptance], plot_line=True, plot_line_value=0.5,
                     plot_line_text='coffe house<br>aggregated<br>acceptance<br>= 0.5', title='Coffee House Acceptance Rate Analysis By Groups')

Count of drivers with no kids that work as healthcare practitioners or building/cleaning/maintenance jobs:  38
Coffee House promo acceptance rate of those with no kids and work as healthcare practitioners or building, grounds cleaning and maintenance jobs":
	Acceptance Rate: 0.816
	All others:      0.494



<span style='font-size:20px;'>&#128161;</span> Drivers accepted Coffee House coupons at a significantly higher rate (`%81.6`) when they had no kids and worked as healthcare practitioners or as building or grounds cleaning and maintenance, compared to all others acceptance rate (`%49.4`).

### Findings
`Coffee House` coupons were:
- offered the most out of 5 different coupon categories
- most offered on warm (`80F°`) and sunny days
- accepted at rate (`49.9%`) less than the overall coupuns aggregate acceptance rate (`56.9%`)
- accepted at higher rate (`52%`) on warmer weather (80F°), even compared to the aggragated (`50%`) rate, while colder weather temperatures of `30F°` and `55F°` saw higher rejection rates (`55.6%`, `54.4%`), respectively, when compared to the aggregated rejection rate (`50.1`)
- offered less to drivers that went to `Coffee House` 4 or more times a month, that showed high acceptance rate of (`67.5%`), compare to drivers that went less than 4 times a month, and showed less frequent accepted rate (`36.2%`)
- accepted at a higher rate (`66%`) for drivers who went to coffee houses more than once a month, with compared to the aggregated `Coffee House` acceptance rate (`50%`).
- accepted at the same acceptance rate (`50%`) as the aggregated `Coffee House` rate, since all drivers were at least 5 minutes from the `Coffee House` destination (3996)
- had negative correlation between the distance from the destination in minutes, and the acceptance rate. The longer the distance to the destination, the less acceptance rate of `Coffee House` coupons. 
- rejected at higher rates of (`54.6%`, `65.4`) for drivers that were 15 or 25 minutes away from the coffee house coupon, respectively, compared to the `Coffee House` aggregated coupon rejection rate (`50.1%`)
- accepted at a higher rate (`69.6%`) by younger drivers below the age of 21. That said, we have to consider that this group (drivers under 21), were the age group that was offered the least (`3.8%`) amount of `Coffee House` coupons. 
- rejected at a higher rate (`57.9%`) by older drivers, over 50 years old, compared to all other age groups and also with comparison to the `Coffee House` aggregated rejection rate (`50.1%`)
- offered more (2227 vs 1769 times) with longer expiration period (24H)
- accepted at a higher rate (`58.3%`) by drivers that were offered with with longer expiration period (1 day), compared to the acceptance rate (`43.1%`) of coupons that were offered with shorter expiration period (2 hours). 
- had higher acceptance rate (`70.8%`) by drivers that ate more than 8 times a month at cheap restaurants and had lower income (less than $25K), compared to `Coffee House` coupons that were offered to others (`49.5%`)
- accepted at a higher rate (`%63.8`) by drivers older than 25 and went to coffee houses more than once a month, compared to all others acceptance rate (`%42.8`)
- accepted at a significantly higher rate (`%81.6`) by drivers who had no kids and worked as healthcare practitioners or as building or grounds cleaning and maintenance, compared to all others acceptance rate (`%49.4`). However, we need to be careful with srtong conclusions, since the sample size of this group is fairly small - 38, about 1% of the `Coffee House` coupons.

### Summary
Drivers were just slighly more likely (`52%`) to accept `Coffee House` coupons on warmer weather (`80F°`), compared to colder weather and to the overall aggregated `Coffee House` acceptance rate (`49.9%`). Drivers were more likely to reject the coupons on colder days at higher rates (`55.6%`, `54.4%`), compared to the aggregated `Coffee House` rejection rate (`50.1%`)

Consider offering more coupons to drivers who went to `Coffee House` 4 or more times a month (had a 22% `Coffee House` coupons share), since they showed higher acceptance rate of (`67.5%`), compared to drivers that went less than 4 times a month to coffee houses, and showed acceptance rate of just (`36.2%`)

Consider offering less coupons to drivers who went to `Coffee House` less than 4 times a month (had 72% `Coffee House` coupons share), since they showed higher rejection rate of (`63.8%`), compared to drivers that went 4 or more times in a month to coffee houses, that had only (`32.5%`) rejection rate, and compared to the aggregated `Coffee House` rejection rate (`50.1%`)

Consider offering more coupons to drivers that are 15 minutes or less from their destination, which their share is currently 40.365%, since the acceptance rate (`59.1%`) was higher compared to drivers who were more than 15 minutes from the destination

Consider offering less coupons to drivers that are 15 minutes and had rejection rate of (`54.7%`, `65.5%`) for (more than 15 mins, more than 25 mins), respectively - which was higher than the aggregated (`50.1%`) rejection rate 

Consider offering more coupons to younger drivers (less than 21 old), who showed higher coupon acceptance rate (`69.6%`), but had only (`3.8%`) ocerall share.

Consider offering less coupuns to older drivers, over 50 years old, who rejected coupuns at a higher rate (`57.9%`), compared to all other age groups and also with comparison to the aggregated `Coffee House` rejection rate (`50.1%`)

Consider offering more coupons with longer expiration period (1 day), that were accepted by drivers at a higher rate (`58.3%`), compared to the acceptance rate (`43.1%`) of coupons that were offered with shorter expiration period (2 hours). Altenatively, we can try and extend the 2 hours expiration period window for the short term expiration, and see if we get higher acceptance rate by doing so.

Consider offering more coupons to drivers that drivers who went more than 8 times a month at cheap restaurants and had lower income (less than $25K), since they had had higher acceptance rate (`70.8%`), compared to coupons that were offered to others (`49.5%`). This combination had only (`~2%`) share, and it's worth checking if increasing its share will ead to a higher acceptance rate.