### Introduction to A/B Testing

A/B testing, also known as split testing, is a method used to compare two versions of a product, webpage, or algorithm to determine which one performs better. In machine learning and data analysis, A/B testing is often used to evaluate the performance of different models, features, or strategies.
The basic idea is to randomly divide your user base or dataset into two groups:

- Group A: The control group, which receives the current version (A)
- Group B: The treatment group, which receives the new version (B)

You then collect data on how each group performs according to your chosen metric(s) and use statistical analysis to determine if there's a significant difference between the two groups.

### Sample Dataset

Let's create a sample dataset for an e-commerce website that wants to test two different recommendation algorithms:

- Algorithm A: The current recommendation system
- Algorithm B: A new recommendation system

Our metric will be the click-through rate (CTR) of recommended products.

In [1]:
import pandas as pd
import numpy as np

np.random.seed(42)

# Generate sample data
n_users = 1000
data = {
    "user_id": range(1, n_users + 1),
    "group": np.random.choice(["A", "B"], size=n_users),
    "impressions": np.random.randint(10, 100, size=n_users),
    "clicks": np.random.randint(0, 20, size=n_users),
}

df = pd.DataFrame(data)
df["ctr"] = df["clicks"] / df["impressions"]

print(df.head())


   user_id group  impressions  clicks       ctr
0        1     A           72       3  0.041667
1        2     B           26      16  0.615385
2        3     A           82       2  0.024390
3        4     A           42      12  0.285714
4        5     A           93       8  0.086022


### A-B Testing
Separates the data into two groups, calculates the mean CTR for each group, and performs a two-sample t-test to compare the means.



In [2]:
import scipy.stats as stats

# Separate the groups
group_a = df[df['group'] == 'A']
group_b = df[df['group'] == 'B']

# Calculate mean CTR for each group
ctr_a = group_a['ctr'].mean()
ctr_b = group_b['ctr'].mean()

print(f"Mean CTR for Group A: {ctr_a:.4f}")
print(f"Mean CTR for Group B: {ctr_b:.4f}")

# Perform t-test
t_statistic, p_value = stats.ttest_ind(group_a['ctr'], group_b['ctr'])

print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

Mean CTR for Group A: 0.2683
Mean CTR for Group B: 0.2468
T-statistic: 1.2296
P-value: 0.2191


### Statistical Analysis

The t-test helps us determine if there's a statistically significant difference between the two groups. Here's how to interpret the results:

- T-statistic: Measures the difference between the two group means relative to the variation in the data. A larger absolute value indicates a greater difference between the groups.
- P-value: The probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis is true. In A/B testing, the null hypothesis is typically that there's no difference between the groups.

    - If p-value < significance level (usually 0.05), we reject the null hypothesis and conclude that there's a significant difference between the groups.
    - If p-value ≥ significance level, we fail to reject the null hypothesis and cannot conclude that there's a significant difference.

In [3]:
alpha = 0.05  # Significance level

print("\nInterpretation:")
if p_value < alpha:
    print("There is a statistically significant difference between the two groups.")
    if ctr_b > ctr_a:
        print(
            "Group B (new algorithm) performs better than Group A (current algorithm)."
        )
    else:
        print(
            "Group A (current algorithm) performs better than Group B (new algorithm)."
        )
else:
    print("There is no statistically significant difference between the two groups.")

# Calculate relative improvement
relative_improvement = (ctr_b - ctr_a) / ctr_a * 100
print(f"\nRelative improvement: {relative_improvement:.2f}%")



Interpretation:
There is no statistically significant difference between the two groups.

Relative improvement: -7.98%


### Add confidence intervals

In [4]:
from scipy.stats import t

# Calculate confidence interval
def confidence_interval(data, confidence=0.95):
    n = len(data)
    m = np.mean(data)
    std_err = stats.sem(data)
    h = std_err * t.ppf((1 + confidence) / 2, n - 1)
    return m, m - h, m + h


ci_a = confidence_interval(group_a["ctr"])
ci_b = confidence_interval(group_b["ctr"])

print("\nConfidence Intervals (95%):")
print(f"Group A: {ci_a[0]:.4f} ({ci_a[1]:.4f} - {ci_a[2]:.4f})")
print(f"Group B: {ci_b[0]:.4f} ({ci_b[1]:.4f} - {ci_b[2]:.4f})")


Confidence Intervals (95%):
Group A: 0.2683 (0.2434 - 0.2931)
Group B: 0.2468 (0.2233 - 0.2704)
