To check if the results of an A/B test are statistically significant using the covariance measure, we first need to calculate the sample covariance between the two groups (A and B). The sample covariance is calculated as follows:

In [110]:
import pandas as pd
df = pd.read_csv('RMA A_B Test Data - rma_ABtest.csv')

In [111]:
df['date'] =  pd.to_datetime(df['record_date'], format='%d/%m/%Y')

In [14]:
active_df = df[df['status'] == 'Active'].sort_values(by=['date'])

In [15]:
others_df = df[df['status'] == 'Others'].sort_values(by=['date'])

In [112]:
df.columns

Index(['record_date', 'status', 'ctr', 'conv_per_click', 'CVR',
       'revenue_per_visit', 'aov', 'date'],
      dtype='object')

In [125]:
import numpy as np

dimension = "aov"

group_a = active_df[dimension]
group_b = others_df[dimension]

cov = np.cov(group_a, group_b)[0][1]

In [126]:
print(cov)

1391.3104251496097


The resulting covariance value can then be compared to a critical value from a t-distribution table (or calculated using a t-test) to determine if it is statistically significant.

In [127]:
# from scipy.stats import t

# alpha = 0.05  # significance level of 0.05 this means we are 95% that there is an impact 
# # confidence = 1 - alpha

# # degrees of freedom
# df = len(group_a) + len(group_b) - 2

# # critical value
# cv = t.ppf(1.0 - alpha, df)

# # calculate t-statistic
# t_stat = cov / (np.std(group_a) * np.std(group_b) / np.sqrt(len(group_a)))

# # compare to critical value
# if abs(t_stat) > cv:
#     print("Results are statistically significant.")
# else:
#     print("Results are not statistically significant.")


In [128]:
from scipy import stats

# Sample data
control = group_a
treatment = group_b

# Perform t-test
t, p = stats.ttest_ind(control, treatment)

# Check if results are statistically significant
alpha = 0.05 # significance level
if p < alpha:
    print("Results are statistically significant.")
else:
    print("Results are not statistically significant.")
print(t, p)

Results are not statistically significant.
1.0018634716879475 0.317903649622347


However, t-test assumes that the data is normally distributed for its results to be valid. So let's check if group A is normally distributed.

In [129]:
import numpy as np
from scipy import stats

x = group_a
k2, p = stats.normaltest(x)

alpha = 0.05
print("p = {:g}".format(p))
p = 8.4713e-19
if p < alpha:  # null hypothesis: x comes from a normal distribution
    print("The null hypothesis can be rejected")
else:
    print("The null hypothesis cannot be rejected")

p = 0.296618
The null hypothesis can be rejected


It appears that the data is not normally distributed.

If the data is not normally distributed, you may want to use a non-parametric test such as the Wilcoxon rank-sum test (also known as the Mann-Whitney test) instead of a t-test. The Wilcoxon rank-sum test can be used to compare two independent samples and does not assume that the data is normally distributed. Here's an example of how you could use the scipy.stats library in Python to perform a Wilcoxon rank-sum test:

In [130]:
from scipy import stats

# Sample data
A = group_a
B = group_b

# Perform Wilcoxon rank-sum test
stat, p = stats.ranksums(A, B)

# Check if results are statistically significant
alpha = 0.05 # significance level
if p < alpha:
    print("Results are statistically significant.")
else:
    print("Results are not statistically significant.")


Results are not statistically significant.


In summary, covariances between **active** and **others** were measured and then subsequently, both t-test and rank-sum test were perfomed in order to determine whether the A/B test had a significant impact on the metrics.rank-sum test is performed since it was found that the values are not distributed.

**Metrics that had impact: CTR, CPC**

**Metrics that had no impact: CVR, Revenue Per Visit, AOV**

Note: Both t-test and rank-sum test were performed with a confidence threshold of 95%.