# Campaign Result Analysis

In this campaign there are two groups: the control group and the test group. The test group consists of Mailer 1 and Mailer 2. Mailer 1 is a basic and cost-effective mail, while Mailer 2 is constructed with high-quality, colorful cardboard.

We will conduct a two-step test for this analysis. First, we will use a chi-square test to assess if there is any association between the test group (Mailer 1 and Mailer 2) and the control group (Mailer 3) in terms of sign-up rate. Afterward, we will perform a z-test between Mailer 1 and Mailer 2 to determine if there is a significant difference in sign-up rates.

## Import library

In [30]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from scipy.stats import chi2_contingency
from statsmodels.stats.proportion import proportions_ztest

## Load Data

In [2]:
df_campaign_data = pd.read_excel('grocery_database.xlsx', sheet_name= 'campaign_data')

In [3]:
df_campaign_data.head()

Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag
0,74,delivery_club,2020-07-01,Mailer1,1
1,524,delivery_club,2020-07-01,Mailer1,1
2,607,delivery_club,2020-07-01,Mailer2,1
3,343,delivery_club,2020-07-01,Mailer1,0
4,322,delivery_club,2020-07-01,Mailer2,1


## Chi-Square Test

In [4]:
Ho = 'There is no relationship between treatment group and sign-up rate, they are independent'
Ha = 'There is a relationship between treatment group and sign-up rate, they are not independent'

In [5]:
def Grouper(row):
    if row['mailer_type'] == 'Mailer1' or row['mailer_type'] == 'Mailer2':
        return 'Test'
    else:
        return 'Control'

In [6]:
df_campaign_data['Group'] = df_campaign_data.apply(Grouper, axis= 1)
df_campaign_data

Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag,Group
0,74,delivery_club,2020-07-01,Mailer1,1,Test
1,524,delivery_club,2020-07-01,Mailer1,1,Test
2,607,delivery_club,2020-07-01,Mailer2,1,Test
3,343,delivery_club,2020-07-01,Mailer1,0,Test
4,322,delivery_club,2020-07-01,Mailer2,1,Test
...,...,...,...,...,...,...
865,372,delivery_club,2020-07-01,Mailer2,1,Test
866,104,delivery_club,2020-07-01,Mailer1,1,Test
867,393,delivery_club,2020-07-01,Mailer2,1,Test
868,373,delivery_club,2020-07-01,Control,0,Control


In [7]:
treatment_pivot = df_campaign_data.pivot_table(
    columns= 'signup_flag',
    index= 'Group',
    aggfunc= 'size'
)

treatment_pivot['Total'] = treatment_pivot.sum(axis= 1)
treatment_pivot.loc['Total'] = treatment_pivot.sum(axis= 0)

treatment_pivot

signup_flag,0,1,Total
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,140,19,159
Test,461,250,711
Total,601,269,870


In [8]:
print(f'''Control sign-up rate : {round((19 / 159) * 100, 2)}%
Test sign-up rate    : {round((250/711) * 100, 2)}%
''')

Control sign-up rate : 11.95%
Test sign-up rate    : 35.16%



In [10]:
treatment_pivot.iloc[:2, :2]

signup_flag,0,1
Group,Unnamed: 1_level_1,Unnamed: 2_level_1
Control,140,19
Test,461,250


In [11]:
stat, pvalue, dof, expected = chi2_contingency(treatment_pivot.iloc[:2, :2])
pvalue

1.79868789769882e-08

In [24]:
def display_result(pvalue, Ho, Ha):
    if pvalue > 0.05:
        print(f'''pvalue = {pvalue} \n{Ho}
    ''')
    else:
        print(f'''pvalue = {pvalue}\n{Ha}
    ''')

In [25]:
display_result(pvalue, Ho, Ha)

pvalue = 1.79868789769882e-08
There is a relationship between treatment group and sign-up rate, they are not independent
    


## Z-Test
From the chi-square test we know that treatment and sign-up rate is not independent, so we can proceed to the second test to see if there is any significant difference between mailer.

In [26]:
Ho = 'There is no significant difference in sign-up rates between Mailer1 and Mailer2'
Ha = 'Sign-up rates of Mailer2 is larger than Mailer1'

In [27]:
mailer_pivot = df_campaign_data.loc[df_campaign_data['mailer_type'] != 'Control'].pivot_table(
    columns= 'signup_flag',
    index= 'mailer_type',
    aggfunc= 'size'
)

mailer_pivot['Total'] = mailer_pivot.sum(axis= 1)
mailer_pivot.loc['Total'] = mailer_pivot.sum(axis= 0)

mailer_pivot

signup_flag,0,1,Total
mailer_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mailer1,252,123,375
Mailer2,209,127,336
Total,461,250,711


In [28]:
m1_signup = mailer_pivot.loc['Mailer1', 1]
m2_signup = mailer_pivot.loc['Mailer2', 1]
m1_total = mailer_pivot.loc['Mailer1', 'Total']
m2_total = mailer_pivot.loc['Mailer2', 'Total']

print(f'''Mailer1 sign-up rate : {round((m1_signup / m1_total) * 100, 2)}%
Mailer2 sign-up rate : {round((m2_signup / m2_total) * 100, 2)}%
''')

Mailer1 sign-up rate : 32.8%
Mailer2 sign-up rate : 37.8%



In [31]:
stat, pvalue = proportions_ztest(
    count= [m2_signup, m1_signup],
    nobs= [m2_total, m1_total],
    alternative= 'larger'
)

pvalue

0.08175576111699284

In [32]:
display_result(pvalue, Ho, Ha)

pvalue = 0.08175576111699284 
There is no significant difference in sign-up rates between Mailer1 and Mailer2
    


## AB Testing Conclusion

From AB Testing we can conclude that the campaign is succesfully increase the sign-up rate, but we can not proof that there is a significant difference between the sign-up rate of Mailer 1 and Mailer 2.

## Recomendation

It is more benneficial to use Mailer 1, not only because it is more cost effective but also because there is no significant difference in sign-up rates between Mailer 1 and Mailer 2