<div align="center">

# Some A/B Testing methods in Python
</div>
<hr>

<br>

**Z-Test:** The z-test is similar to the t-test but assumes that the sample size is large enough to use the normal distribution. The z-test can be used to compare the means of two samples and determine if there is a statistically significant difference between them.

In [1]:
from statsmodels.stats.weightstats import ztest

sample_a = [1, 0, 1, 1, 0, 1, 0, 1, 0, 1]
sample_b = [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]

z_stat, p_val = ztest(sample_a, sample_b)

print("z-statistic: ", z_stat)
print("p-value: ", p_val)

z-statistic:  0.8660254037844384
p-value:  0.38647623077123283


<br>

**Chi-Square Test:** The chi-square test is used to determine if there is a significant association between two categorical variables. In A/B testing, the chi-square test can be used to compare the proportion of users who converted in version A versus version B.

In [3]:
from scipy.stats import chi2_contingency
import numpy as np

obs = np.array([[100, 50], [80, 70]])

chi_stat, p_val, dof, expected = chi2_contingency(obs)

print("chi-square statistic: ", chi_stat)
print("p-value: ", p_val)

chi-square statistic:  5.013888888888889
p-value:  0.025144761173357424


<hr>

<br>

### Some Applications for more than 2 groups with ANOVA;

Dataset: seaborn tips dataset

* Question: Is there any statistically significant difference between day(s) and total bill?

* H0: Fri = Sat = Sun = Thur (There is no statistically significant difference between the total bills of the four groups(day))

* H1: (There is statistically significant difference between the total bills of the four groups(day))

> NOTE: You can find more detail about hypothesis in **Bidding-AB-Test.ipynb** file

In [119]:
df = pd.read_csv("tips.csv")

In [120]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [121]:
df.groupby("day")["total_bill"].mean()

day
Fri     17.151579
Sat     20.441379
Sun     21.410000
Thur    17.682742
Name: total_bill, dtype: float64

In [123]:
# The Shapiro-Wilks Test for Normality
for group in list(df["day"].unique()):
    pvalue = shapiro(df.loc[df["day"] == group, "total_bill"])[1]
    print(group, 'p-value: %.4f' % pvalue)

Sun p-value: 0.0036
Sat p-value: 0.0000
Thur p-value: 0.0000
Fri p-value: 0.0409


> h0 reject because of p < 0.05

In [124]:
# Levene’s Test for Homogeneity of variances
test_stat, pvalue = levene(df.loc[df["day"] == "Sun", "total_bill"],
                           df.loc[df["day"] == "Sat", "total_bill"],
                           df.loc[df["day"] == "Thur", "total_bill"],
                           df.loc[df["day"] == "Fri", "total_bill"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.6654, p-value = 0.5741


> h0 can not be rejected because of p > 0.05

In [125]:
kruskal(df.loc[df["day"] == "Thur", "total_bill"],
        df.loc[df["day"] == "Fri", "total_bill"],
        df.loc[df["day"] == "Sat", "total_bill"],
        df.loc[df["day"] == "Sun", "total_bill"])

KruskalResult(statistic=10.403076391437086, pvalue=0.01543300820104127)

> h0 is rejected because of p < 0.05. There is statistically significant difference between the total bills of the four groups(day)

#### There is statistically significant difference. However, can we find that which two groups(days) make differences?

In [128]:
from statsmodels.stats.multicomp import MultiComparison
comparison = MultiComparison(df['total_bill'], df['day'])
tukey = comparison.tukeyhsd(0.05)
print(tukey.summary())

Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1 group2 meandiff p-adj   lower   upper  reject
----------------------------------------------------
   Fri    Sat   3.2898 0.4541 -2.4799  9.0595  False
   Fri    Sun   4.2584 0.2371 -1.5856 10.1025  False
   Fri   Thur   0.5312 0.9957 -5.4434  6.5057  False
   Sat    Sun   0.9686 0.8968 -2.6088   4.546  False
   Sat   Thur  -2.7586 0.2374 -6.5455  1.0282  False
   Sun   Thur  -3.7273 0.0668 -7.6264  0.1719  False
----------------------------------------------------


> All of them are rejected. That means, there is no statistically significant difference between **these 2 groups**