# Hypothesis Testing - part 2

As stated in the 1st part, this notebook will deal with the definition and exemplification of A/B testing, crucial in day-to-day business operations, as well as Hypothesis Testing sensitivity. We will explore concepts such as Minimum Detectable Effect, CUPED, CUPAC and other metric analysis techninques useful in A/B testing and data-driven decision making.

## Index:


**Libraries used:**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

## 1. A/B Testing

A/B testing is a form of two-sample (in this case) Z-Test, since it is normally used used to compare two versions of a product, website, ad, or feature to determine which one performs better (and therefore sample sizes are ussually large). It is widely used in marketing, UX design, and product optimization. 

We essentially compare Group A (control group), which is the baseline, and Group B, which is the modified version. Usually there is a division of users between the 2 groups randomly to avoid bias and then proceed with the collection of data and the conduction of this test. Metrics collected to use in this test include, for example:

- Conversion Rate (CR): Percentage of users completing a desired action.
- Click-Through Rate (CTR): Percentage of users who clicked on a link.
- Bounce Rate: Percentage of users who leave without engaging.
- Revenue per User: How much revenue is generated per visitor.

The hypothesis are essentially the same as in every two-sample test: either there isn't or is a significant difference between the 2 versions. Since it is a Z-test (because of the sample size and we are testing for the proportions (nature of the metrics presented)), Central Limit Theorem is applicable to the score formula, which results in:

$$
Z = \frac{p_B - p_A}{\sqrt{SE_A^2 + SE_B^2}}
$$

where $p_A$ and $p_B$ are the observed metric, usually rates, since it is in percentage (**proportions**); $SE_A$ and $SE_B$ are are the observed conversion rates, and the denominator represents the standard error of the difference in proportions.

Here is an example case:

A company wants to test two call-to-action (CTA) buttons:

- A (Control): "Sign Up Now"
- B (Treatment): "Get Started"

the test runs for a period of time, with the following results:

| Group | Visitors | Conversions | Conversion Rate |
|-------|----------|------------|----------------|
| A     | 10,000  | 500        | 5%             |
| B     | 10,000  | 600        | 6%             |

This results in the following **Hypothesis Test**, with a significance level of 5%:

- Null Hypothesis (H₀): No difference between A and B.
- Alternative Hypothesis (H₁): A significant difference exists.

In [2]:
# Data from the scenario
n_A = 10000  # Number of visitors in group A
n_B = 10000  # Number of visitors in group B
conv_A = 500  # Conversions in group A
conv_B = 600  # Conversions in group B

# Compute conversion rates
p_A = conv_A / n_A
p_B = conv_B / n_B

# Compute standard error for each group
se_A = np.sqrt((p_A * (1 - p_A)) / n_A)
se_B = np.sqrt((p_B * (1 - p_B)) / n_B)

# Compute standard error of the difference
se_diff = np.sqrt(se_A**2 + se_B**2)

# Compute Z-score
z_score = (p_B - p_A) / se_diff

# Compute p-value (two-tailed test)
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# Print results
print(f"Conversion Rate A: {p_A:.4f}")
print(f"Conversion Rate B: {p_B:.4f}")
print(f"Z-score: {z_score:.2f}")
print(f"p-value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject H₀. There is a significant difference between the two variations.")
else:
    print("Conclusion: Fail to reject H₀. No significant difference between the variations.")

Conversion Rate A: 0.0500
Conversion Rate B: 0.0600
Z-score: 3.10
p-value: 0.0019
Conclusion: Reject H₀. There is a significant difference between the two variations.
