# Assess whether there is a difference between mailer 1 and mailer 2 in terms of sign-up rate to the club

In [7]:
# import packages
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency, chi2 

# import data
df = pd.read_excel("grocery_database.xlsx", sheet_name = "campaign_data")

In [8]:
df.head()

Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag
0,74,delivery_club,2020-07-01,Mailer1,1
1,524,delivery_club,2020-07-01,Mailer1,1
2,607,delivery_club,2020-07-01,Mailer2,1
3,343,delivery_club,2020-07-01,Mailer1,0
4,322,delivery_club,2020-07-01,Mailer2,1


In [13]:
# create a 2x2 matrix array and summarise to get our observed frequencies

observed_values = pd.crosstab(df["mailer_type"], df["signup_flag"]).values
observed_values

array([[140,  19],
       [252, 123],
       [209, 127]], dtype=int64)

In [14]:
# signup rate
mailer1_signup_rate = 123 / (252 + 123) 
mailer2_signup_rate = 127 / (209 + 127)
print(mailer1_signup_rate, mailer2_signup_rate)


0.328 0.37797619047619047


In [15]:
# state hypotheses & set significance level

null_hypothesis = "There is no relationship between mailer type and signup rate."
alternate_hypothesis = "There is a relationship between mailer type and signup rate."
significance_level = 0.05

In [20]:
# calculate expected frequencies & chi square statistic

chi2_statistic, p_value, dof, expected_values = chi2_contingency(observed_values, correction = False)
print(chi2_statistic, p_value)

34.85054342506307 2.7058308848098242e-08


In [17]:
# find the critical value for our test

critical_value = chi2.ppf(1 - significance_level, dof)
print(critical_value)

5.991464547107979


Because the critical value is less than the sample’s chi-square, we can not reject the null hypothesis, which means there is no relationship between mailer type and signup rate.

We can suggest that they can stop sending an expensive looking mailer or even a simple mailer to save costs.