# Introduction
A/B Testing main goal is to test the changes in a product or feature with the aim of understanding if the changes had an impact on user behavior pattern. The data is from an ecommerce that changed their webpage to improve the number of paying customers. They are conducting an A/B test to check the impact of the new page. There are two main groups namely Control and Experiment Groups. The experiment group is presented with the new page while the control group is presented with the old web page. Data was collected during the experiment to facilitate hypothesis testing to check if the difference is significant. 

## Key Steps:
* Establishing the baseline test parameters that might include baseline conversion rate, significance level and confidence level.
* Hypothesis creation.
* Assumption Check
* Hypothesis Testing
* Conclusion

## Creating Hypothesis
* **Null Hypothesis**: The conversion rate for the two pages is the same.
* **Alternative Hypothesis**: The conversion rate for the two pages is not the same.

* The confidence level will be 95% i.e. 
         threshold = (1 - 0.95) = 0.05
* The null hypothesis is rejected if the  p-value is below the threshold and the new page design is accepted. 

### Dataset
* **User_id:** unique user number
* **timestamp:** Time
* **group:**  Treatment and Control Group
* **landing_page:** old page or new page
* **converted:** sign up status after viewing the page

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats 
import statsmodels.stats.api as sms 
import matplotlib.pyplot as plt 
import seaborn as sns 

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

## Exploratory Data Analysis

In [2]:
df = pd.read_csv('/kaggle/input/ecommerce-ab-testing-2022-dataset1/ecommerce_ab_testing_2022_dataset1/ab_data.csv')
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,11:48.6,control,old_page,0
1,804228,01:45.2,control,old_page,0
2,661590,55:06.2,treatment,new_page,0
3,853541,28:03.1,treatment,new_page,0
4,864975,52:26.2,control,old_page,1


In [3]:
df.shape

(294480, 5)

### Setting Base Parameters
* **Effect Size:** A measure used to evaluate the strength of a statisticcal claim. It is applied in the statistical hypothesis to determine the power of a statistical test. The standard deviation of the effect size shows the level of uncertainty in the measurement. It has a direct relationship with the difference that is noted between the control and experiment groups. 
* In this analysis, the effect size is set at 2.5% 
* The power of the test is typically set at 0.8.
* The analysis will use a baseline conversion rate of 13% which makes the expected conversion to be 15.5% because of the effect size is 2.5%.

In the next section, the required sample size is calculated.

In [4]:
# Determining effect size based on expected rates
effect_size = sms.proportion_effectsize(0.13,0.15)

# Required sample size
sample_size = sms.NormalIndPower().solve_power(effect_size, alpha=0.05, power=0.8, ratio=1)

sample_size = np.ceil(sample_size)
print(sample_size)

4720.0
