# A/B Testing on Questionnaire Ads

**Background** </br>
SmartAd is a relatively well-known digital marketing agency that help implementing various ad concepts on their clients websites. On this occasion, they want to improve the performance of the questionnaire ad that is shown in a client website.

**Objective**</br>
SmartAd want to know whether the new questionnaire ad concept will more likely make user to engage with it.

## Prerequisites

In [844]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import math
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

In [845]:
# Store the dataset loc as an object
filename = "ads.csv"

In [846]:
# Import the dataset 
df = pd.read_csv(filename)
df.head(5)

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0


In [847]:
# Check the dataset's shape
df.shape

(8077, 9)

- The dataset has 8077 rows and 9 columns

## Data Cleaning

### Check Data Duplicates

In [848]:
# Check duplicate
df.duplicated().sum()

0

- The dataset has no duplicates

### Check Missing Values

In [849]:
# Check missing values
df.isna().sum()

auction_id     0
experiment     0
date           0
hour           0
device_make    0
platform_os    0
browser        0
yes            0
no             0
dtype: int64

- The dataset has no missing value

In [850]:
df["date"].value_counts()

date
2020-07-03    2015
2020-07-09    1208
2020-07-08    1198
2020-07-04     903
2020-07-10     893
2020-07-05     890
2020-07-06     490
2020-07-07     480
Name: count, dtype: int64

In [851]:
df["device_make"].value_counts()

device_make
Generic Smartphone     4743
iPhone                  433
Samsung SM-G960F        203
Samsung SM-G973F        154
Samsung SM-G950F        148
                       ... 
D5803                     1
Samsung SM-G6100          1
HTC M10h                  1
Samsung SM-G925I          1
XiaoMi Redmi Note 5       1
Name: count, Length: 269, dtype: int64

### Check The Data Type

In [852]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8077 entries, 0 to 8076
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   auction_id   8077 non-null   object
 1   experiment   8077 non-null   object
 2   date         8077 non-null   object
 3   hour         8077 non-null   int64 
 4   device_make  8077 non-null   object
 5   platform_os  8077 non-null   int64 
 6   browser      8077 non-null   object
 7   yes          8077 non-null   int64 
 8   no           8077 non-null   int64 
dtypes: int64(4), object(5)
memory usage: 568.0+ KB


The date data need to be converted into datetime data type for further use

In [853]:
df['date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8077 entries, 0 to 8076
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   auction_id   8077 non-null   object        
 1   experiment   8077 non-null   object        
 2   date         8077 non-null   datetime64[ns]
 3   hour         8077 non-null   int64         
 4   device_make  8077 non-null   object        
 5   platform_os  8077 non-null   int64         
 6   browser      8077 non-null   object        
 7   yes          8077 non-null   int64         
 8   no           8077 non-null   int64         
dtypes: datetime64[ns](1), int64(4), object(4)
memory usage: 568.0+ KB


## Data Enrichment

### Create "responded" Variable

In [854]:
df['responded'] = df.apply(lambda row: 1 if row['yes'] != 0 or row['no'] != 0 else 0, axis=1)
df.head(5)

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no,responded
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1,1
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0,0


### Create "accepted" Variable

In [855]:
df['accepted'] = df.apply(lambda row: 1 if row['yes'] == 1 and row['no'] == 0 else 0, axis=1)
df.head(5)

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no,responded,accepted
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1,1,0
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0,0,0


### Rename the Columns for Clarity

In [856]:
# Define the new column names
new_names = {
    'auction_id' : 'id',
    'experiment' : 'test_group',
    'device_make' : 'device_model',
    'platform_os' : 'device_os'
}

# Rename the columns
df = df.rename(columns=new_names)

# See the result
df.head(5)

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,yes,no,responded,accepted
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1,1,0
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0,0,0


### Drop unnecessary columns

In [857]:
# Define the columns to be dropped
columns_to_drop = ['yes', 'no']

# Drop the columns
df = df.drop(columns=columns_to_drop)

# See the result
df.head(5)

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,1,0
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0


In [858]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8077 entries, 0 to 8076
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   id            8077 non-null   object        
 1   test_group    8077 non-null   object        
 2   date          8077 non-null   datetime64[ns]
 3   hour          8077 non-null   int64         
 4   device_model  8077 non-null   object        
 5   device_os     8077 non-null   int64         
 6   browser       8077 non-null   object        
 7   responded     8077 non-null   int64         
 8   accepted      8077 non-null   int64         
dtypes: datetime64[ns](1), int64(4), object(4)
memory usage: 568.0+ KB


### Add Day of Week Column

In [859]:
# Define the column to contain the day of the week of each date
df['day_of_week'] = df['date'].dt.day_name()

# See the result
df.head(5)

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0,Friday
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0,Tuesday
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,1,0,Sunday
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0,Friday
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0,Friday


In [860]:
df['day_of_week'].value_counts()

day_of_week
Friday       2908
Thursday     1208
Wednesday    1198
Saturday      903
Sunday        890
Monday        490
Tuesday       480
Name: count, dtype: int64

In [861]:
df['date'].value_counts()

date
2020-07-03    2015
2020-07-09    1208
2020-07-08    1198
2020-07-04     903
2020-07-10     893
2020-07-05     890
2020-07-06     490
2020-07-07     480
Name: count, dtype: int64

## Calculate Number of Sample Needed

### Response Rate

In [862]:
# Group the data based on their test group
df_response = df.groupby('test_group').agg({'responded':'sum',
                                      'id':'nunique'}).reset_index()

df_response

Unnamed: 0,test_group,responded,id
0,control,586,4071
1,exposed,657,4006


In [863]:
# Add proportion variable showing the ads response rate
df_response['proportion'] = (df_response['responded']) / df_response['id']


df_response

Unnamed: 0,test_group,responded,id,proportion
0,control,586,4071,0.143945
1,exposed,657,4006,0.164004


It is shown that the treatment group has a higher conversion rate than the control group, 14.3% vs 16.4%. Let us understand whether this difference is statistically significant or not by doing a two samples propotion z-test.

## Defining Required Sample Size

The Formula:
n = (Zα+Zβ)^2 * (p1(1-p1)+p2(1-p2)) / (p1-p2)^2,

In [864]:
# Define a function to calculate the required sample size
def sample_size_calc(p1, p2, alpha, type, beta):
    if type == 'one-sided':
        z_crit_alpha = stats.norm.ppf(1 - alpha)
    elif type == 'two-sided':
        z_crit_alpha = stats.norm.ppf(1 - (alpha/2))
    
    z_crit_beta = stats.norm.ppf(1 - beta)
    required_sample_size = (z_crit_alpha + z_crit_beta)**2 * (p1 * (1 - p1) + p2 * (1 - p2)) / (p1 - p2)**2

    return required_sample_size

In [865]:
# Calculate the sample size based on the proportions and other parameters
# p1 = df_response[['proportion'][0]][0]
# p2 = df_response[['proportion'][0]][1]
p1 = 0.14
p2 = 0.17
alpha = 0.05
beta = 0.2

sample_size_required = sample_size_calc(p1, p2, alpha, 'one-sided' , beta) # we use the left tailed test for the z crit
sample_size_required

1796.3763513035212

Turned out we need 1796 samples for each group (Treatment and Control)

## Statitistical Test: Two Samples Proportion Test

In [866]:
# Define the group each day of the week belong to
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
weekends = ['Saturday', 'Sunday']

In [868]:
# Show how many data points are taken based on the day of the week
df['is_weekend'] = np.where(df['day_of_week'].isin(weekends), 1, 0)

df.head()

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week,is_weekend
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0,Friday,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0,Tuesday,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,1,0,Sunday,1
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0,Friday,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0,Friday,0


In [None]:
# Group by the test group and whether it's weekend or not
df_gr_weekend = df.groupby(['test_group', 'is_weekend']).agg({'id':'nunique'}).reset_index()
df_gr_weekend

In [841]:
# Show how many data points are taken based on the device OS
df_gr_dos = df.groupby(['test_group', 'device_os']).agg({'id':'nunique'}).reset_index()
df_gr_dos

Unnamed: 0,test_group,device_os,id
0,control,5,308
1,control,6,3763
2,exposed,5,120
3,exposed,6,3885
4,exposed,7,1


### Analysis on the whole week

In [803]:
# Separate the sample data from the whole week by their test group
df_tr = df[(df['test_group'] == 'exposed')]
df_ctrl = df[(df['test_group'] == 'control')]

In [804]:
# Set the number of samples required for each day (here we collect 257 samples per day)
n = round(sample_size_required)
random_state = 23

# Select random n rows from eact test group
df_tr_sample = df_tr.sample(n=n, random_state=random_state)
df_ctrl_sample = df_ctrl.sample(n=n, random_state=random_state)


# # Select random n rows per group (data grouped by the day of the week) - OBSOLETE - NOT ENOUGH DATA POINTS FOR EACH DAY
# df_tr_sample = df_tr.groupby('day_of_week').apply(lambda x: x.sample(n, random_state=random_state)).reset_index(drop=True)
# df_ctrl_sample = df_ctrl.groupby('day_of_week').apply(lambda x: x.sample(n, random_state=random_state)).reset_index(drop=True)

In [805]:
# Concatenate both treatment and control group into a sample dataframe
df_sample = pd.concat([df_tr_sample, df_ctrl_sample], ignore_index=True)
df_sample.head()

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week
0,3b242ec2-ef54-473b-9876-2a543fb76ac9,exposed,2020-07-04,23,Generic Smartphone,6,Chrome Mobile,0,0,Saturday
1,afab37f9-8881-48eb-b9d7-9b854d618b68,exposed,2020-07-05,5,Generic Smartphone,6,Chrome Mobile,1,1,Sunday
2,030f8197-4daa-4584-a43a-a5805d6947e4,exposed,2020-07-08,20,Generic Smartphone,6,Chrome Mobile,1,0,Wednesday
3,c346153f-9055-4989-9af9-da8b1b26da01,exposed,2020-07-08,14,moto e5 play,6,Chrome Mobile,0,0,Wednesday
4,ce0979fe-adb6-4693-bdbe-b7d655ac11dc,exposed,2020-07-03,13,Generic Smartphone,6,Chrome Mobile,1,1,Friday


In [807]:
df_sample_gr = df_sample.groupby('test_group').agg({'responded':'sum', 'id':'nunique'}).reset_index()

df_sample_gr['proportion'] = (df_sample_gr['responded']) / df_sample_gr['id']
df_sample_gr.head()

Unnamed: 0,test_group,responded,id,proportion
0,control,254,1796,0.141425
1,exposed,298,1796,0.165924


In [808]:
number_of_successes = df_sample_gr['responded'].tolist()
print('Number of Successes:', number_of_successes)

total_sample_sizes = df_sample_gr['id'].tolist()
print('Total Sample Sizes:',total_sample_sizes)

Number of Successes: [254, 298]
Total Sample Sizes: [1796, 1796]


In [809]:
test_stat, p_value = proportions_ztest(number_of_successes, total_sample_sizes, alternative='smaller')
print('Test Stat:', test_stat)
print('P-value:', p_value)

Test Stat: -2.0357034366711537
P-value: 0.020890071898846684


### Analysis by the Day of the weeK

In [810]:
# Define the group each day of the week belong to
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
weekends = ['Saturday', 'Sunday']

#### Response Rate Performance on Weekdays

In [811]:
# Select the sample data from weekdays only
df_tr = df[(df['test_group'] == 'exposed') & (df['day_of_week'].isin(weekdays))]
df_ctrl = df[(df['test_group'] == 'control') & (df['day_of_week'].isin(weekdays))]

## Select the sample data from weekends only -- NOT USED
# df_tr = df[(df['test_group'] == 'exposed') & (df['day_of_week'].isin(weekends))]
# df_ctrl = df[(df['test_group'] == 'control') & (df['day_of_week'].isin(weekends))]

In [830]:
n = round(sample_size_required)

df_tr_sample = df_tr.sample(n=n, random_state=23)
df_ctrl_sample = df_ctrl.sample(n=n, random_state=23)

In [813]:
# Concatenate both treatment and control group into a sample dataframe
df_sample = pd.concat([df_tr_sample, df_ctrl_sample], ignore_index=True)
df_sample.head()

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week
0,6d906653-d642-47b2-9b03-793dbef091cd,exposed,2020-07-10,9,Generic Smartphone,6,Chrome Mobile,0,0,Friday
1,c957e01e-8bb6-4f39-a3af-9d5b7517e6e3,exposed,2020-07-09,7,Samsung SM-G928F,6,Facebook,0,0,Thursday
2,f7d8908b-83d8-4b23-8e1d-2ad47ab82e4c,exposed,2020-07-03,11,Generic Smartphone,6,Chrome Mobile,0,0,Friday
3,11b88ee5-c76f-4680-ad22-b45d99df8cfb,exposed,2020-07-08,20,Generic Smartphone,6,Chrome Mobile,0,0,Wednesday
4,aa6c6cda-e498-4e8f-b886-1d969bd376ea,exposed,2020-07-08,14,Samsung SM-A202F,6,Samsung Internet,0,0,Wednesday


In [814]:
df_sample['day_of_week'].value_counts()

day_of_week
Friday       1636
Thursday      698
Wednesday     692
Monday        284
Tuesday       282
Name: count, dtype: int64

In [815]:
df_sample_gr = df_sample.groupby('test_group').agg({'responded':'sum', 'id':'nunique'}).reset_index()

df_sample_gr['proportion'] = (df_sample_gr['responded']) / df_sample_gr['id']
df_sample_gr.head()

Unnamed: 0,test_group,responded,id,proportion
0,control,251,1796,0.139755
1,exposed,288,1796,0.160356


In [816]:
number_of_successes = df_sample_gr['responded'].tolist()
print('Number of Successes:', number_of_successes)

total_sample_sizes = df_sample_gr['id'].tolist()
print('Total Sample Sizes:',total_sample_sizes)

Number of Successes: [251, 288]
Total Sample Sizes: [1796, 1796]


In [817]:
test_stat, p_value = proportions_ztest(number_of_successes, total_sample_sizes, alternative='smaller')
print('Test Stat:', test_stat)
print('P-value:', p_value)

Test Stat: -1.7286700325308286
P-value: 0.04193408384759775


#### Response Rate Performance on Weekends

In [831]:
# Select the sample data from weekends only
df_tr = df[(df['test_group'] == 'exposed') & (df['day_of_week'].isin(weekends))]
df_ctrl = df[(df['test_group'] == 'control') & (df['day_of_week'].isin(weekends))]

In [833]:
# Becouse there's not enough data points, we're going to use all data points as sample
df_tr_sample = df_tr
df_ctrl_sample = df_ctrl

In [834]:
# Concatenate both treatment and control group into a sample dataframe
df_sample = pd.concat([df_tr_sample, df_ctrl_sample], ignore_index=True)
df_sample.head()

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week
0,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,1,0,Sunday
1,004940f5-c642-417a-8fd2-c8e5d989f358,exposed,2020-07-04,0,Generic Smartphone,6,Chrome Mobile WebView,0,0,Saturday
2,0073f9a6-0856-44b4-870d-8e525878ad29,exposed,2020-07-04,15,Samsung SM-G960F,6,Chrome Mobile WebView,0,0,Saturday
3,008aafdf-deef-4482-8fec-d98e3da054da,exposed,2020-07-04,16,Generic Smartphone,6,Chrome Mobile,1,1,Saturday
4,0110d73e-9308-4794-85d8-2771d1f351b9,exposed,2020-07-05,18,Generic Smartphone,6,Chrome Mobile,0,0,Sunday


In [835]:
df_sample['day_of_week'].value_counts()

day_of_week
Saturday    903
Sunday      890
Name: count, dtype: int64

In [836]:
df_sample_gr = df_sample.groupby('test_group').agg({'responded':'sum', 'id':'nunique'}).reset_index()

df_sample_gr['proportion'] = (df_sample_gr['responded']) / df_sample_gr['id']
df_sample_gr.head()

Unnamed: 0,test_group,responded,id,proportion
0,control,111,788,0.140863
1,exposed,165,1005,0.164179


**Note:** It's shown tha the response rate in treatment group is higher than the control group. However, due to the lack of sample size, we cannot break down the statistical test.

### Analysis be The OS Type

In [818]:
# List down all the device OS collected
df['device_os'].value_counts()

device_os
6    7648
5     428
7       1
Name: count, dtype: int64

#### Analysis on Device OS 5

In [819]:
# Select the sample data from device os 5 only

device_os = 5

df_tr = df[(df['test_group'] == 'exposed') & (df['device_os'] == device_os)]
df_ctrl = df[(df['test_group'] == 'control') & (df['device_os'] == device_os)]

In [820]:
# Becouse there's not enough data points, we're going to use all data points as sample
df_tr_sample = df_tr
df_ctrl_sample = df_ctrl

In [821]:
# Concatenate both treatment and control group into a sample dataframe
df_sample = pd.concat([df_tr_sample, df_ctrl_sample], ignore_index=True)
df_sample.head()

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week
0,00d22463-2d1f-44de-8d67-11d6c6b96a00,exposed,2020-07-03,1,iPhone,5,Chrome Mobile iOS,0,0,Friday
1,0467d28b-c734-4ba7-b3aa-718d12097ba1,exposed,2020-07-07,11,iPhone,5,Chrome Mobile iOS,0,0,Tuesday
2,04d92e6c-db97-474a-8d59-8907ddbe9755,exposed,2020-07-06,5,iPhone,5,Mobile Safari,0,0,Monday
3,053943f2-6aed-4322-b27f-644d69abc3dc,exposed,2020-07-08,19,iPhone,5,Mobile Safari,0,0,Wednesday
4,0b47c542-41e9-4d25-ab4a-b01ea48a803d,exposed,2020-07-07,19,iPhone,5,Mobile Safari,0,0,Tuesday


In [822]:
df_sample_gr = df_sample.groupby('test_group').agg({'responded':'sum', 'id':'nunique'}).reset_index()

df_sample_gr['proportion'] = (df_sample_gr['responded']) / df_sample_gr['id']
df_sample_gr.head()

Unnamed: 0,test_group,responded,id,proportion
0,control,13,308,0.042208
1,exposed,4,120,0.033333


**Note:** It's shown tha the response rate is lower in Device OS 5. However, due to the lack of sample size, we cannot break down the statistical test based on the device type.

#### Analysis on Device OS 6

In [823]:
# Select the sample data from Device OS 6 only

device_os = 6

df_tr = df[(df['test_group'] == 'exposed') & (df['device_os'] == device_os)]
df_ctrl = df[(df['test_group'] == 'control') & (df['device_os'] == device_os)]

In [824]:
# Select random n rows for each test group
n = round(sample_size_required)

df_tr_sample = df_tr.sample(n=n, random_state=23)
df_ctrl_sample = df_ctrl.sample(n=n, random_state=23)

In [825]:
# Concatenate both treatment and control group into a sample dataframe
df_sample = pd.concat([df_tr_sample, df_ctrl_sample], ignore_index=True)
df_sample.head()

Unnamed: 0,id,test_group,date,hour,device_model,device_os,browser,responded,accepted,day_of_week
0,e0052f0e-6137-458f-8738-4e9110557194,exposed,2020-07-09,14,Generic Smartphone,6,Chrome Mobile,0,0,Thursday
1,af9a163f-3262-4281-9c29-284e783d003f,exposed,2020-07-10,12,I3312,6,Chrome Mobile WebView,0,0,Friday
2,3ea116e9-5fd7-400e-8e5f-a8b6f9d79595,exposed,2020-07-06,8,Generic Smartphone,6,Chrome Mobile,0,0,Monday
3,ed4e37b1-7f48-4db4-afae-1fc67fdf21db,exposed,2020-07-06,1,Samsung SM-G930V,6,Facebook,0,0,Monday
4,3e97a9e2-f0e7-49db-acd5-4335357b38b9,exposed,2020-07-08,15,Samsung SM-G975F,6,Chrome Mobile WebView,0,0,Wednesday


In [826]:
df_sample_gr = df_sample.groupby('test_group').agg({'responded':'sum', 'id':'nunique'}).reset_index()

df_sample_gr['proportion'] = (df_sample_gr['responded']) / df_sample_gr['id']
df_sample_gr.head()

Unnamed: 0,test_group,responded,id,proportion
0,control,266,1796,0.148107
1,exposed,286,1796,0.159243


In [827]:
number_of_successes = df_sample_gr['responded'].tolist()
print('Number of Successes:', number_of_successes)

total_sample_sizes = df_sample_gr['id'].tolist()
print('Total Sample Sizes:',total_sample_sizes)

Number of Successes: [266, 286]
Total Sample Sizes: [1796, 1796]


In [828]:
test_stat, p_value = proportions_ztest(number_of_successes, total_sample_sizes, alternative='smaller')
print('Test Stat:', test_stat)
print('P-value:', p_value)

Test Stat: -0.925319743941434
P-value: 0.17739980577213543


**Note:** Based on the statistical test, it's shown that the result is not statistically significant.

## Conclusion and Recommendation

After running the two samples proportional Z-test, it’s found out that we can reject the null hypothesis and accept that the response rate of the interactive ad is bigger than the static one.

Even though the interactive ad (treatment) is statistically significantly better that the static ad (control), it’s still inconclusive whether if the performance will give significant impact to the business,

Thus said, by incorporating the monetary impact of the improved performance would be very beneficial to the business.

More over, due to the lack of sample size, we cannot break down the statistical test based on the device type. Would potentially show more granular analysis.

### Acceptance Rate (Obsolete - Not Enough Samples)

In [149]:
# df_acceptance = df[df['responded'] == 1].groupby('test_group').agg({'accepted':'sum', 'id':'nunique'}).reset_index()
# df_acceptance
# df_acceptance['proportion'] = (df_acceptance['accepted']) / df_response['id']
# df_acceptance

Unnamed: 0,test_group,accepted,id,proportion
0,control,264,586,0.064849
1,exposed,308,657,0.076885
