# A/B Testing in Practice: Tools & Techniques

### 1. Setting the Objective & Hypothesis

- Objective: Clearly define what you want to test.
- Hypothesis: A testable statement, specific and measurable.
  
  - Two-sample test: Compare performance (e.g., conversion rates) between two versions (A & B).
    - Null hypothesis (H0): No significant difference (μA = μB).
    - Alternative hypothesis (H1): Significant difference (μA ≠ μB).
  
  - One-sample test: Tests the difference between A and B against a known benchmark or standard.
    - Easier to interpret; often used as default.

### 2. Sample Size Determination

- Importance: Ensures test validity; avoid too small (unreliable) or too large (resource waste) samples.
- Key factors:
  - Baseline measurement (e.g., current conversion rate)
  - Minimum Detectable Effect (MDE): smallest effect size you want to detect (e.g., 5% increase)
  - Significance level (α): usually 0.05 (5% false positive rate)
  - Statistical power (1 - β): usually 0.8 (80% chance to detect true effect)
- Power analysis helps calculate the minimal sample size using variance and expected effect size.

### 3. Randomisation & Group Assignment

- Randomly assign users to control (A) or treatment (B) groups.
- Ensures groups are similar and reduces bias.
- Maintain roughly equal group sizes for better statistical power.
- Define the eligible population for the test.

### 4. Implementation

- Create control and treatment versions with only one variable difference.
- Integrate A/B testing tools to randomly assign users.
- Implement tracking for user interactions.
- Test the setup thoroughly before rollout.
- Roll out gradually, starting with a small percentage.

### 5. Data Collection

- Identify KPIs (e.g., conversion rate, click-through rate).
- Collect consistent, accurate data for both groups.
- Monitor for anomalies or errors.
- Ensure sample size targets are met.
- Store data securely and comply with privacy regulations.

### 6. Statistical Analysis

- Calculate means, variances, standard deviations for metrics.
- Choose appropriate statistical test (e.g., t-test for normal data).
- Calculate p-value and compare to α=0.05 for significance.
- Compute confidence intervals (e.g., 95%) for effect size precision.

### 7. Interpretation of Results

- Look beyond statistical significance to practical/business impact.
- Consider:
  - Magnitude of effect — is it meaningful?
  - Business implications — revenue, conversion, customer satisfaction
  - Costs and feasibility of implementation
  - Long-term impact and risks

### 8. Common Pitfalls & How to Avoid Them

- Only looking at statistical significance: Also assess practical significance.
- Ignoring external factors: Control for seasonality, trends, other changes.
- Misinterpreting causality: Use proper randomization to support causal claims.
- Insufficient sample size: Ensure power and sample size are adequate.
- Cherry-picking results: Report all relevant outcomes transparently.
- Short-term focus: Consider long-term effects and conduct follow-ups.