# A/B Testing with T-Test

This dataset comes from https://www.kaggle.com/datasets/farhadzeynalli/online-advertising-effectiveness-study-ab-testing
The exploratory analysis can be found in the AB_Test_EDA.ipynb file

**Background**
A large company with a substantial user base plans to increase sales through advertisement on its website. However, they are undecided about whether the ads increase sales or not. In order to determine whether this is the case, 20,000 customers were subjected to A/B testing for 31 days. 

Each customer was assigned a unique identifier, and the groups were split in a 60:40 ratio, with the larger group seeing the new advertisement and the smaller group seeing a public service announcement (psa) in its place. The outcome measured was whether the client made a purchase, delivered as a boolean statement. Additional parameters and metrics collected included the day of the month the user saw the most ads, the hour of the day the user saw the most oads, and the total number of ads seen by each user. 

The company has contracted the analysis out with the above information. 

**Follow-up**
The analysis prompt of whether to run the ad was poorly-defined, so the following questions were posed to the project manager: 
*Critical Questions*
1. Is the PSA currently running, or is this also a new addition to the campaign? 
    *I need to know whether this is truly a control group, or if we're testing two new products compared to one another.* 
2. What is your current conversion rate? 
3. What is the minimum percentage increase in conversion that you would need to take action on moving forward with the campaign?
    *This would be critical prior to collecting the data to ensure enough people were sampled, but at this point, we can only verify that we have enough to identify this with an appropriate confidence level*  

*Additional Questions*
4. Are you looking at insight into which day or days of the week to run the campaign, or is this irrelevant?
    If so, what month-year was the 31-day test performed? 
5. Are you trying to target a certain demographic to boost sales during certain times of day?
6. Do you want to know how many ads may yield different results? 
    *In other words, we are reframing the question of whether ads drive conversion to how many ads are needed to increase conversion if at all.* 


**Responses**
1. Since this is run on the company's own website, the PSA is currently a placeholder sale prices of certain products. It is in the same format as the rest of the webpage, and doesn't particularly stand out. The new ad would replace the PSA, showing a featured product linking to a page with sale prices. 
    *The PSA is truly a control group*
2. The metrics I've been provided suggest the current conversion is around 3 or 4 percent. We're really hoping to bump it up to 7 or 8 percent. 
3. We'd like to see a 3% to 4% increase in sales, but we'll continue with the ad campaign with at least a 2% increase. 
4. Right now we're running the ad on our own website, but we'd like to see metrics on that for potential ad campaigns on different sites.
6. Yes - what we would like to see is if the number of times the client visited the website correlates with purchases, so what we would like to know is 
    Was the number of visits on the page with either format correlated with conversion? 
    Was the number of visits on the page with the PSA correlated with conversion? 
    Was the number of visits on the page with the Ad correlated with conversion? 
    Is there a difference between the number of visits with the PSA and number of visits with the Ad? 


---
## Analytical Preparation
We can define the control and treatment groups for the study as the PSA and ad, respectively. It's also important to define the metric that we are comparing, and here is is the conversion rate. Since the conversion rate is a quantitative measure, a T-Test is an appropriate statistical test to evaluate whether there was a significant difference between the means of the control and treatment, and since there is the potential for the treatment to have either an increased or decreased mean with respect to the control, a Two-Tailed T-Test is the more appropriate. 

H<sub>o</sub>:  Null Hypothesis 
H<sub>a</sub>:  Alternative Hypothesis
p:              Control Group Conversion Rate (PSA)
p<sub>o</sub>:  Treatment Group Conversion Rate (Ad)

**Null Hypothesis**

H<sub>o</sub>: p = p<sub>o</sub>
There will be no difference between the conversion rates of the control (PSA) and the treamtent (Ad)

**Alternative Hypothesis**

H<sub>a</sub>: p $\not=$ p<sub>o</sub>
There will be a significant difference between the conversion rates of the control (PSA) and the treament (Ad)

**Significance and Power**

$\alpha$ = 0.05
$\beta$ = 0.80

Per convention, $\alpha$ was set to 5%, giving the probability of a Type I Error (false positive). A value is considered significantly different if and only if the p-value is under $\alpha$, reducing the chance of the finding of being a false positive to under 5%. 
Also per convention, $\beta$ was set to 80%, giving the probability of a Type II Error (false Negative). This is used in determining the likelihood of testing a true effect if there is one. This will be used in the power analysis to verify that the minimum sample size required to achieve the desired significance level, effect size, and statistical power. 

---
## Experimental Preparation

Typically, a Power Analysis is done prior to implementing a study and collecting survey information. The client has already implemented and collected survey data, and they have not provided us with an exact current conversion rate. What we will need to calculate for this analysis includes the effect size, statistical power, alpha, and the ratio of the number of values in the treatment group to those in the control group. For this we will go with the low end of their predicted existing conversion rate of 3%, and the minumum change in the range of 2%, ending with a 5% outcome. The smaller the difference between the two means, the more observations must be collected to be confident of a true difference between means not due to random sampling error.  

In [5]:
# Dependencies
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import ceil
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

# Some plot styling preferences
plt.style.use('seaborn-whitegrid')
font = {'weight': 'bold',
        'size': 14}

mpl.rc('font', **font)

random_state = 621

## Power Analysis

In [19]:
effect_size = sms.proportion_effectsize(0.03, 0.05)
effect_size

-0.10286079052330155

In [20]:
required_observations = sms.NormalIndPower().solve_power(
    effect_size, 
    power = 0.8, 
    alpha = 0.05, 
    ratio=1.5  # This provides the ratio of the number values in sample 2 to those in sample 1. The 40:60 ratio would be 1.5, as 60 is 1.5 times the control 
)

# ceil is the ceiling function; it rounds up to the next whole number
required_observations = ceil(required_observations)
print(f'There are {required_observations} observations in each group to be 80% confident that there is a 2% difference between the two groups.')


There are 1237 observations in each group to be 80% confident that there is a 2% difference between the two groups.


Since the smaller of our groups has nearly 8000 observations, we have more than enough data to work with in assessing this kind of change. 

---
## Data Exploration and Cleaning

 

In [3]:
ab = pd.read_csv('online_ad_AB.csv', index_col=0)
ab.head()

Unnamed: 0_level_0,test group,made_purchase,days_with_most_add,peak ad hours,ad_count
customerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,ad,False,24,20,5
2,psa,False,21,16,9
3,psa,False,1,18,8
4,ad,False,20,23,7
5,ad,False,3,13,5


In [7]:
control_sample = ab[ab['test group'] == 'psa'].sample(
    n=1000, random_state=random_state)

treatment_sample = ab[ab['test group'] == 'ad'].sample(
    n=1000, random_state=random_state)

ab_sampled = pd.concat([control_sample, treatment_sample], axis=0)
ab_sampled.reset_index(drop=True, inplace=True)

In [9]:
conversion_rates = ab_sampled.groupby('test group')['made_purchase']

# Std. deviation of the proportion
def std_p(x): return np.std(x, ddof=0)
# Std. error of the proportion (std / sqrt(n))
def se_p(x): return stats.sem(x, ddof=0)


conversion_rates = conversion_rates.agg([np.mean, std_p, se_p])
conversion_rates.columns = ['conversion_rate', 'std_deviation', 'std_error']

conversion_rates.style.format('{:.3f}')


Unnamed: 0_level_0,conversion_rate,std_deviation,std_error
test group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ad,0.058,0.234,0.007
psa,0.037,0.189,0.006


## 