# Experimental Design: A/B Testing in Fintech
This notebook demonstrates key concepts in experimental design using an A/B test scenario for a fintech (e-payment) app. We want to know if improving the onboarding flow reduces the drop-off rate.

## Business Situation & Hypotheses
**Situation:**
- The product team wants to know if a new onboarding flow for an e-payment app reduces the user drop-off rate compared to the old flow.

**Null Hypothesis (H₀):** The old and new onboarding screens have the same drop-off rate.

**Alternative Hypothesis (H₁):** The new onboarding flow reduces the drop-off rate.

## A/B Testing & Random Sampling
- **A/B Testing** is a randomized experiment comparing two versions (A: old, B: new) to measure the effect of a change.
- **Random Sampling** ensures that users are randomly assigned to either group, reducing bias and confounding variables.

In [1]:
import numpy as np
import pandas as pd
np.random.seed(42)
n_users = 1000
users = pd.DataFrame({
    'user_id': np.arange(n_users),
    'group': np.random.choice(['A', 'B'], size=n_users)
})
users['group'].value_counts()

# users.head()

group
B    510
A    490
Name: count, dtype: int64

## Sample Size & Statistical Power
- **Sample Size** affects the ability to detect a true effect. Too small a sample may miss real differences.
- **Statistical Power** is the probability of detecting an effect if it exists (commonly set at 80%).
- Power increases with larger sample size, larger effect size, and lower variance.

In [2]:
from statsmodels.stats.power import NormalIndPower

effect_size = 0.1  # expected reduction in drop-off rate (10%)
alpha = 0.05  # significance level
power = 0.8  # desired power
analysis = NormalIndPower()
sample_size = analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power, alternative='smaller')

print(f"Required sample size per group: {int(np.ceil(sample_size))}")

Required sample size per group: 10


  pow_ += stats.norm.cdf(crit - d*np.sqrt(nobs)/sigma)
Failed to converge on a solution.

  print(f"Required sample size per group: {int(np.ceil(sample_size))}")


## Measurement Metrics
- **Drop-off Rate:** Percentage of users who do not complete onboarding.
- **Conversion Rate:** Percentage of users who complete onboarding.
- These metrics are compared between groups A and B.

In [8]:
# Simulate experiment outcomes
p_A = 0.30  # 30% drop-off in old flow
p_B = 0.22  # 22% drop-off in new flow (improved)
users['dropped_off'] = users.apply(lambda row: np.random.rand() < (p_A if row['group']=='A' else p_B), axis=1)
print(users.head())

results = users.groupby('group')['dropped_off'].mean()
print('Drop-off rates:')
print(results)


   user_id group  dropped_off
0        0     A        False
1        1     B        False
2        2     A         True
3        3     A         True
4        4     A        False
Drop-off rates:
group
A    0.293878
B    0.221569
Name: dropped_off, dtype: float64


## P-value & Confidence Interval
- **P-value:** Probability of observing the data (or more extreme) if the null hypothesis is true. Low p-value (< 0.05) suggests evidence against H₀.
- **Confidence Interval (CI):** Range of values likely to contain the true effect size (e.g., 95% CI).

In [9]:
import statsmodels.api as sm
contingency = pd.crosstab(users['group'], users['dropped_off'])
print('Contingency Table:')
print(contingency)
# Proportion test
count = np.array([contingency.loc['A', True], contingency.loc['B', True]])
nobs = np.array([contingency.loc['A'].sum(), contingency.loc['B'].sum()])
stat, pval = sm.stats.proportions_ztest(count, nobs, alternative='larger')
ci_low, ci_upp = sm.stats.proportion_confint(count[1], nobs[1], alpha=0.05, method='normal')
print(f"P-value: {pval:.6f}")
print(f"95% CI for B drop-off rate: ({ci_low:.3f}, {ci_upp:.3f})")

Contingency Table:
dropped_off  False  True 
group                    
A              346    144
B              397    113
P-value: 0.004450
95% CI for B drop-off rate: (0.186, 0.258)


## Causal Inference Basics
- **Randomization** in A/B testing helps ensure that observed differences are due to the onboarding flow, not confounding factors.
- **Causal Effect:** The difference in drop-off rates can be interpreted as the causal effect of the new onboarding flow, assuming random assignment and no interference.

## Type I and Type II Errors
- **Type I Error (False Positive):** Conclude the new flow reduces drop-off when it does not (rejecting H₀ incorrectly). Probability = α (e.g., 0.05).
- **Type II Error (False Negative):** Fail to detect a real improvement (not rejecting H₀ when H₁ is true). Probability = β (commonly, power = 1 - β).

## Summary & Recommendations
- Use random sampling and sufficient sample size to ensure reliable results.
- Analyze drop-off rates, p-values, and confidence intervals to make data-driven decisions.
- Understand the risks of Type I and II errors when interpreting results.
- A/B testing is a powerful tool for causal inference in product experiments.