In [18]:
import numpy as np
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind, shapiro, mannwhitneyu
import warnings

warnings.filterwarnings("ignore")

In [20]:
df = pd.read_csv(r"data.csv")
df

Unnamed: 0,USER_ID,VARIANT_NAME,REVENUE
0,737,variant,0.0
1,2423,control,0.0
2,9411,control,0.0
3,7311,control,0.0
4,6174,variant,0.0
...,...,...,...
9995,1981,control,0.0
9996,502,variant,0.0
9997,9214,variant,0.0
9998,7741,control,0.0


In [4]:
df.describe()

Unnamed: 0,USER_ID,REVENUE
count,10000.0,10000.0
mean,4981.0802,0.099447
std,2890.590115,2.318529
min,2.0,0.0
25%,2468.75,0.0
50%,4962.0,0.0
75%,7511.5,0.0
max,10000.0,196.01


In [5]:
# Generating the data
control = df.loc[df["VARIANT_NAME"] == "control"]
test = df.loc[df["VARIANT_NAME"] == "variant"]

In [14]:
control

Unnamed: 0,USER_ID,VARIANT_NAME,REVENUE
1,2423,control,0.0
2,9411,control,0.0
3,7311,control,0.0
6,2849,control,0.0
7,9168,control,0.0
...,...,...,...
9988,428,control,0.0
9994,3129,control,0.0
9995,1981,control,0.0
9998,7741,control,0.0


In [15]:
test

Unnamed: 0,USER_ID,VARIANT_NAME,REVENUE
0,737,variant,0.00
4,6174,variant,0.00
5,2380,variant,0.00
8,6205,variant,0.00
13,2529,variant,2.15
...,...,...,...
9991,8864,variant,0.00
9992,9303,variant,0.00
9993,2400,variant,0.00
9996,502,variant,0.00


# Step 1 -  Formulating Hypotheses

Null Hypothesis (H0): μ1 = μ2 (There is no statistically significant difference in terms of revenue between the Control and Variant Groups)

Alternative Hypothesis (H1): μ1 ≠ μ2 (There is a statistically significant difference between the Control and Variant Groups in terms of revenue)

# Step 2 - Checking assumptions

When performing an A/B test, there are several assumptions that should be checked to ensure the validity of the statistical analysis.

### Normal Distribution  (Shapiro-Wilk Test).

H0: The assumption of normal distribution is satisfied.

H1: The assumption of normal distribution is not satisfied.

$p < 0.05$: H0 REJECTED

$p > 0.05$: H0 NOT REJECTED

### Homogeneity of Variance.

H0: Variances of the groups are homogeneous.

H1: Variances of the groups are not homogeneous.

$p < 0.05$: H0 REJECTED

$p > 0.05$: H0 NOT REJECTED

In [16]:
t, p_value = shapiro(control["REVENUE"])
print('Test Stat=%.3f, P-value=%.3f'%(t, p_value))

t, p_value = shapiro(test["REVENUE"])
print('Test Stat=%.3f, P-value=%.3f'%(t, p_value))

Test Stat=0.018, P-value=0.000
Test Stat=0.027, P-value=0.000


P-Value is less than 0.05 so we reject H0 hyposesis, indicating that the data does not follow a normal distribution. 
As a result, there is no need to assess the homogeneity of variances. We can proceed directly to non-parametric statistical analysis, specifically the Mann-Whitney test.

# Step 3 - Performing Mann-Whitney non-parametric test 

In [17]:
test_stat,pvalue = mannwhitneyu(control["REVENUE"],
                                test['REVENUE'])
print('Test stat = %.4f, P-value=%.4f'%(test_stat,pvalue))

Test stat = 12521564.0000, P-value=0.4783


Result: P-valie > 0.05 that means that we accept the null hypothesis. In this context, we conclude that there is no statistically significant difference in terms of revenue between the Control and Variant Groups.