#### Prepared for Gabor's Data Analysis

### Data Analysis for Business, Economics, and Policy
by Gabor Bekes and  Gabor Kezdi
 
Cambridge University Press 2021

**[gabors-data-analysis.com ](https://gabors-data-analysis.com/)**

 License: Free to share, modify and use for educational purposes. 
 Not to be used for commercial purposes.


### CHAPTER 20

**CH20B Fine tuning social media advertising**

using the ab-test-social-media dataset

version 1.0 2021-05-05

In [1]:
import os
import sys
import warnings

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from scipy.stats import ttest_ind
from statsmodels.stats.power import TTestIndPower

warnings.filterwarnings("ignore")


In [2]:
# Current script folder
current_path = os.getcwd()
dirname = current_path.split("da_case_studies")[0]

# location folders
data_in = dirname + "da_data_repo/ab-test-social-media/clean"
data_out = dirname + "da_case_studies/ch20-ab-test-social-media/"
output = dirname + "da_case_studies/ch20-ab-test-social-media/output/"

func = dirname + "da_case_studies/ch00-tech-prep/"
sys.path.append(func)


## Part I

sample size calculation

sample size calculation with planned rates

In [3]:
clickthrough = 0.01

conversion = 0.05

proportionA = clickthrough * conversion

proportionB = proportionA * 1.2

es = sms.proportion_effectsize(proportionA, proportionB)
# es = proportionB - proportionA
TTestIndPower().solve_power(es, power=0.8, alpha=0.05) * 2


1722229.753552455

In [4]:
clickthrough = 0.0032

conversion = 0.0082

proportionA = clickthrough * conversion

proportionB = proportionA * 1.2

es = sms.proportion_effectsize(proportionA, proportionB)
# es = proportionB - proportionA
TTestIndPower().solve_power(es, power=0.8, alpha=0.05) * 2


32833931.07283028

## Part II

p-value of tests

In [5]:
summary_data = pd.read_excel(data_in + "/ab-test-summary.xlsx").set_index("action_type")
# summary_data = pd.read_excel("https://osf.io/download/mhybr/").set_index("action_type")


In [6]:
summary_data


Unnamed: 0_level_0,show,clicks,action
action_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Action A,1000000,3323,32
Action B,1000000,3128,21


In [7]:
data = pd.DataFrame(0, columns=["type_b", "clicks", "action"], index=range(0, 2000000))

data.loc[0:999999, "type_b"] = 1

data.loc[0 : summary_data.loc["Action A", "clicks"] - 1, "clicks"] = 1
data.loc[1000000 : 1000000 + summary_data.loc["Action B", "clicks"] - 1, "clicks"] = 1
data.loc[0 : summary_data.loc["Action A", "action"] - 1, "action"] = 1
data.loc[1000000 : 1000000 + summary_data.loc["Action B", "action"] - 1, "action"] = 1


In [8]:
data.groupby(["type_b", "clicks"]).count()


Unnamed: 0_level_0,Unnamed: 1_level_0,action
type_b,clicks,Unnamed: 2_level_1
0,0,996872
0,1,3128
1,0,996677
1,1,3323


In [9]:
data.groupby(["type_b", "action"]).count()


Unnamed: 0_level_0,Unnamed: 1_level_0,clicks
type_b,action,Unnamed: 2_level_1
0,0,999979
0,1,21
1,0,999968
1,1,32


In [10]:
smf.ols("clicks ~ 1 + type_b", data).fit(cov_type="HC0").summary()


0,1,2,3
Dep. Variable:,clicks,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,5.914
Date:,"Wed, 05 Oct 2022",Prob (F-statistic):,0.015
Time:,18:09:56,Log-Likelihood:,2902000.0
No. Observations:,2000000,AIC:,-5804000.0
Df Residuals:,1999998,BIC:,-5804000.0
Df Model:,1,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.0031,5.58e-05,56.016,0.000,0.003,0.003
type_b,0.0002,8.02e-05,2.432,0.015,3.78e-05,0.000

0,1,2,3
Omnibus:,4075093.883,Durbin-Watson:,0.0
Prob(Omnibus):,0.0,Jarque-Bera (JB):,7855996717.433
Skew:,17.522,Prob(JB):,0.0
Kurtosis:,308.031,Cond. No.,2.62


In [11]:
smf.ols("action ~ 1 + type_b", data).fit(cov_type="HC0").summary()


0,1,2,3
Dep. Variable:,action,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,2.283
Date:,"Wed, 05 Oct 2022",Prob (F-statistic):,0.131
Time:,18:09:57,Log-Likelihood:,7700500.0
No. Observations:,2000000,AIC:,-15400000.0
Df Residuals:,1999998,BIC:,-15400000.0
Df Model:,1,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,2.1e-05,4.58e-06,4.583,0.000,1.2e-05,3e-05
type_b,1.1e-05,7.28e-06,1.511,0.131,-3.27e-06,2.53e-05

0,1,2,3
Omnibus:,9693574.572,Durbin-Watson:,0.057
Prob(Omnibus):,0.0,Jarque-Bera (JB):,118646783578266.64
Skew:,194.249,Prob(JB):,0.0
Kurtosis:,37733.763,Cond. No.,2.62
