# A/B Test 

Use it in order to test a new product, new feature online.


Before going to the technical details we have to pay attention at Policy and Ethics for Experiments. We should answer these questions to respect Ethics and policy:
`1` Risk: What risk the participant being exposed to?
`2` Benefit: What benefits might be the outcome of the study?
`3` Choice: What other choices do participants have?








**Steps**:
- Randomly take 2 sets of users: 1 for the control set (existing feature), the other for the experiment set (new feature)
- Evaluate how differently answers users are to determine the best version of the feature.

**We can't use A/B Test when**:
- Results may take too long to have
- No data available for the experiment

**In practice**:
* Construct the user flow (the customer funnel)
* Choose a metric :
    - Click through rate: `nbClick/nbPageView`, to know usability of a feature
    - Click through probability: `nbUniqueVisitorWhoClick/nbUniqueVisitorToPag`, to know impact of a feature
* Perform experiment sizing
* Analyze results
* Draw conclusion



#### Hypotheses testing
How likely my result was obtain by chance? I have to calculate P(results due to chance)

So we need to make an hypothesis of what the result would be if the experiment have no effect - this is called the **NULL HYPOTHESIS (H0)**

If the experiment have no effect, that means the probability of the control groupe is equal to the probability of the experiement group. Or the difference between the two probabilities are null

So **H0 : Pcont = Pexp, or Pcont-Pexp = 0**

We also need an hypothesis of what the result would be if the experiement have an effect, which is the opposite to H0 - this is called the **ALTERNATIVE HYPOTHESIS (H1)**

So **H1 : Pcont-Pexp != 0**

Next steps:
* measure Pcont & Pexp
* Calculate hyp = Pcont-Pexp
* Calculate the probability of this result (hyp) was due to chance if the H0 was true P(hyp|H0)
* If we want to reject or accept an hypothesis at 95% of confidence, alpha = 1-0.05 = 0.05
* If P(hyp|H0) < alpha, we accept H0 and reject H1

****
* TotalSucces = total nb of success through both group
* TotalUsers = total nb of users
##### Polled probability of a click:
$$\hat{P}_{pool} = \frac{TotalSucces}{TotalUsers} = \frac{X_{exp} + X_{cont}}{N_{exp} + N_{cont}}$$

##### Polled standard error of a click :
$$SE_{pool} = \sqrt{\hat{P}_{pool} * (1-\hat{P}_{pool}) * (\frac{1}{N_{cont}} + \frac{1}{N_{exp}})}$$

****
##### Difference between Pexp & Pcont :
$$ \hat{d} = \hat{P}_{exp} - \hat{P}_{cont} $$

****
Under the null hypothesis:
$$d = 0$$

So We expect: 
$$\hat{d} \sim \mathcal{N} (0,SE_{pool})$$

****
comparison to our Z-score
- if:
$$\hat{d} > 1.96*SE_{pool}$$
- or:
$$\hat{d} < -1.96*SE_{pool}$$
We reject H0 and say that our diffenrence represent a statistically significant difference. That means, we reject the fact of our experiment has no effect.

### Design Experiment 
##### Find the size (nb of data points, here nb of page views) by using the statistical power. That refere to the minimum nb of data points to have in order to make sure that we have enough power to conclude with high probability that our interesting result is, in fact, statistically significant.

In [15]:
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize
import numpy as np
from scipy import stats

# current click through rate (baseline conversion rate)
cctr = 0.1

# And we want an at least a 2% increase on the new feature so (common value from business side)
practical_significance = 0.02 

# desired click through rate on new experiment
dctr = cctr + practical_significance

# how many data points (page views) we need to reliably to detect that kind of change ?
# We have to compute the statistical power

# leave out the "nobs" parameter to solve for it
nip = NormalIndPower()

# Go from cctr to dctr
pe = proportion_effectsize(cctr, dctr)

nip.solve_power(effect_size = pe, alpha = .05, power = 0.8)

3834.5957398840183

* Once we have the minimum size of data points, we can collect some data and do the following

### Analyze

In [18]:
# So now we have
N_cont = 10072 # total success
X_cont = 974 # total users
N_exp = 9886
X_exp = 1242
p_exp = X_exp/N_exp
p_cont = X_cont/N_cont

P_pool = (X_cont + X_exp) / (N_cont + N_exp)
SE_pool = np.sqrt(P_pool * (1-P_pool) * ((1/N_cont)+(1/N_exp)))

d = abs(p_exp-p_cont)

# test H0, so d will be tested
interval_conf = stats.norm.interval(0.95, loc=d, scale=SE_pool)

print(f"\nConfidence Interval: {interval_conf}")


Confidence Interval: (0.020210660302896456, 0.03764628777733894)


### Draw conclusion

In [24]:
error_margin = SE_pool*1.96
if d > error_margin:
    print("Test statistically significant: The result will be enough to take a decision")
    
if interval_conf[0] >= practical_significance:
    print("\nYES we can launch the product 🚀")

Test statistically significant: The result will be enough to take a decision

YES we can launch the product 🚀


##### A common value from business side for the level of practical significance is 2%.
##### And we can see that, refere to our confidence interval, we have at least 2.02, so we can launch the product.

##### In other words, It is highly probable that the click through probability change by at least 2.02% (practical significance level) and we can be confident that we have at least that big of a change at the 95% level.

In [25]:
error_margin

0.008717973932038056