# Bayesian Statistic A/B Testing

This dataset comes from https://www.kaggle.com/datasets/farhadzeynalli/online-advertising-effectiveness-study-ab-testing
The exploratory analysis can be found in the AB_Test_EDA.ipynb file

We are starting with 20,000 users who were either exposed to the Ads or PSA in a 60:40 split, respectively. 
The data were collected for the frequencies, days within a month, and times of day, and number of ads the users were exposed to. 
The result is whether they made a purchase or not, though there is currently no conversion rate calculated. 

Before doing this, we should determine a priors for each group.
Since the data provided didn't include any insight as to what the product is or what industry it is involved in, the cross-market conversion rate for products is somewhere between 3 and 10 percent. Without any information, I want to ensure that I'm using a weak prior, so as not to strongly influence the outcomes, but at the same time I want to used a somewhat informed prior, since 50% conversion seems highly unlikely, especially if the company is launching an ad campaign to further drive sales. 

At this time, I am going to use Beta(5,95), setting the mean at 0.05, and a wide range of possible alternatives. 

In [1]:
ab <- read.csv('online_ad_AB.csv')

In [4]:
head(ab)

customerID,test.group,made_purchase,days_with_most_add,peak.ad.hours,ad_count
1,ad,False,24,20,5
2,psa,False,21,16,9
3,psa,False,1,18,8
4,ad,False,20,23,7
5,ad,False,3,13,5
6,ad,False,13,22,7


I need to separate the test groups so that I can count and calculate the conversion rates

In [7]:
ad <- ab[ab$test.group == "ad", ]
psa <- ab[ab$test.group == "psa", ]

To get the conversion rates for each of them, I need to look at the number of customers that yielded a TRUE in made_purchase out of the total number of cutomers from that subset. There were 20000 customers sampled with a 60:40 split, which the following should verify.

## PSA

In [11]:
head(psa)

Unnamed: 0,customerID,test.group,made_purchase,days_with_most_add,peak.ad.hours,ad_count
2,2,psa,False,21,16,9
3,3,psa,False,1,18,8
8,8,psa,False,6,22,10
9,9,psa,False,6,15,7
10,10,psa,False,2,19,5
12,12,psa,False,6,16,8


In [35]:
total_psa <- nrow(psa)

In [34]:
purchases_psa <- nrow(psa[psa$made_purchase == 'TRUE', ])

In [49]:
conversion_psa <- round(purchases_psa / total_psa, 4)

## Ad

In [38]:
head(ad)

Unnamed: 0,customerID,test.group,made_purchase,days_with_most_add,peak.ad.hours,ad_count
1,1,ad,False,24,20,5
4,4,ad,False,20,23,7
5,5,ad,False,3,13,5
6,6,ad,False,13,22,7
7,7,ad,False,7,19,6
11,11,ad,False,16,21,10


In [40]:
total_ad <- nrow(ad)

In [41]:
purchases_ad <- nrow(ad[ad$made_purchase == 'TRUE', ])

In [47]:
conversion_ad <- round(purchases_ad / total_ad, 4)

Now to show the results in a table: 

In [55]:
Collected <- matrix(c(purchases_psa, total_psa-purchases_psa, conversion_psa, purchases_ad, total_ad-purchases_ad, conversion_ad), ncol = 3, byrow = TRUE)
colnames(Collected) <- c("Purchased", "No_Purchase", "Observed_Conversion")
rownames(Collected) <- c("PSA", "Ad")
Collected <- as.data.frame(Collected)

In [56]:
Collected

Unnamed: 0,Purchased,No_Purchase,Observed_Conversion
PSA,257,7690,0.0323
Ad,803,11250,0.0666


In [78]:
n.trials <- 100000
prior.alpha <- 5
prior.beta <- 95


In [75]:
a.samples <- rbeta(n.trials, 257 + prior.alpha, 7690 + prior.beta)
b.samples <- rbeta(n.trials, 803 + prior.alpha, 11250 + prior.beta)

In [76]:
p.b_superior <- sum(b.samples > a.samples)/n.trials

In [77]:
p.b_superior