# A/B testing

## Overview
- Dùng để so sánh hiệu quả của thay đổi mang lại
- VD: hiệu quả của CTR (click through rate) sau khi đổi thiết kế cho banner
- Phù hợp khi test incremental changes (not major changes) vì mục đích là test hiệu quả của thay đổi nhỏ đem lại. Nếu major change thì có quá nhiều thay đổi lớn nên sự thay đổi có thể đến từ nhiều nguồn khác nhau
    - Không thích hợp cho new products, new branding, completely new UX


## Example

<img src="https://cosmiccoding.com.au/static/img/tutorials/abtests/2020-01-12-ABTests_1_0.jpg" style="height:300px">

1. Situation:
    - Không hài lòng với website cũ vì nút `Add to cart` ko được hightlighted
    - Intuition: nếu highlight lên có thể tăng conversion rates?
    
2. Design:
    - A: old version of website
    - B: new version of website
    - Part of the customers is directed to the A version
    - Part of the customers is directed tho the B version
    
3. Note:
    - Only change is the button (small incremental changes)

## Sample data

In [1]:
import numpy as np

In [2]:
# Giả lập data
np.random.seed(1)
a = np.random.binomial(1, 0.08, size=100000) # 100K khách hàng visit version cũ

np.random.seed(1)
b = np.random.binomial(1, 0.12, size=150000) # 150K khách hàng visit version mới

In [3]:
# View data
print(a[:100])
print(b[:100])

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0]


In [4]:
# Tính CTR
rate_a = np.mean(a)
rate_b = np.mean(b)

print(rate_a)
print(rate_b)

0.08029
0.12012


- Nhận xét: version mới có vẻ cải thiện hơn version cũ (rate tăng từ 8% lên 12%)
- Câu hỏi: is this improvement STATISTICALLY SIGNIFICANT (or just happen by chance)?

## Test

- Null hypothesis ($H_0$): no significant difference between your test group (B) and control group (A)
- Alternative hypothesis ($H_a$): there is significant difference between your test group (B) and control group (A)

In [5]:
# Test in Python
from scipy import stats
stats.ttest_ind(b, a)

Ttest_indResult(statistic=32.00033685997921, pvalue=3.075864376906776e-224)

#### Interpret the results
- t-statistic: express the difference between 2 groups in units of standard error. Higher means more statistical significant -> support alternative hypothesis $H_a$
- pvalue: probability of $H_0$

- In this case: $H_0$ is very unlikely -> the difference is statistically significant

Common pvalue thresholds:

- 10%: pvalue < 0.1
- 5%: pvalue < 0.05
- 1%: pvalue < 0.01

## Another example

In [6]:
# Giả lập data
np.random.seed(1)
a = np.random.binomial(1, 0.08, size=100)

np.random.seed(1)
b = np.random.binomial(1, 0.12, size=100)

In [7]:
# Tính CLR
rate_a = np.mean(a)
rate_b = np.mean(b)

print(rate_a)
print(rate_b)

0.07
0.13


In [8]:
from scipy import stats
stats.ttest_ind(b, a)

Ttest_indResult(statistic=1.4142135623730951, pvalue=0.15886970489441132)