# A/B Testing

## Introduction

One of the most popular applications of hypothesis testing is __A/B Testing__ . It is a basic control experiment and provides a way to compare the two versions of a variable to find out which performs better in a controlled environment.

For instance, you have made certain changes to your website recently. Unfortunately, you have no way of knowing with full accuracy how the next 100,000 people who visit your website will behave. That is the information we cannot know today, and if we were to wait until those 100,000 people visited our site, it would be too late to optimize their experience.

You can divide the products into two parts – A and B. Here A will remain unchanged while you make significant changes in B’s packaging. Now, on the basis of the response from customer groups who used A and B respectively, you try to decide which is performing better.

<p align=center><img src=images/ab_test.png width=500></p>

The objective here is to check which option brings higher traffic on the website i.e the conversion rate. We will use A/B testing and collect data to analyze which option performs better.

## Formulate an hypothesis

$\mathbf{H_{0}}$: From an A/B test perspective, the null hypothesis states that there is no difference between the control and variant groups. Here our $\mathbf{H_{0}}$ is "there is no difference in the conversion rate in customers receiving option A and B".

$\mathbf{H_{a}}$: The alternative hypothesis is what you might hope that your A/B test will prove to be true. In our example, the $\mathbf{H_{a}}$ is "the conversion rate of option B is higher than those who receive option A".

We have to collect enough evidence through our tests to reject the null hypothesis.

## Create Control Group and Test Group

The next step is to decide the group of customers that will participate in the test. Here we have two groups – The Control group, and the Test (variant) group.

The Control Group is the one that will receive newsletter A and the Test Group is the one that will receive newsletter B.

## Statistical significance

How can we conclude from here that the Test group is working better than the control group?

For rejecting our null hypothesis we have to prove the Statistical significance of our test.

There are two types of errors that may occur in our hypothesis testing:

1. __Type I error__: We reject the null hypothesis when it is true. That is we accept the variant B when it is not performing better than A
2. __Type II error__: We failed to reject the null hypothesis when it is false. It means we conclude variant B is not good when it performs better than A

That means the difference between your control version and the test version is not due to some error or random chance. To prove the statistical significance of our experiment we can use a two-sample T-test.

The two–sample t–test is one of the most commonly used hypothesis tests. It is applied to compare the average difference between the two groups.


In [None]:
!pip install seaborn

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as ss
import matplotlib.pylab as plt
data= pd.read_csv("https://aicore-files.s3.amazonaws.com/Data-Science/ab_data.csv")

In [None]:
data

The data is representing percentage of users that clicked on an ad when they were presented with layout A, and people who clicked on the ad when they wew presented with layout B

Let’s plot the distribution of target and control group:

In [None]:
plt.figure("Test Plots")
sns.distplot(data.Conversion_A, label='A')
sns.distplot(data.Conversion_B, label='B')
plt.legend()
plt.show()


We can use scipy to calculate the t-value. Remember that the t-value tells us whether the difference between the two groups is statistically significant.

In [None]:
help(ss.ttest_ind)

In [None]:
t_stat, p_val= ss.ttest_ind(data.Conversion_B, data.Conversion_A)
print(f'The t-test value is {t_stat}, and the p-value is {p_val}')

Here, our p-value is less than the significance level i.e 0.05. Hence, we can reject the null hypothesis. This means that in our A/B testing, option B is performing better than option A. So our recommendation would be to replace our current option with B to bring more traffic on our website.

There are many tools available for conducting A/B testing but being a data scientist you must understand the factors working behind it. Also, you must be aware of the statistics in order to validate the test and prove its significance.