🌟 Exercise 1: Calculating Required Sample Size
You are planning an A/B test to evaluate the impact of a new email subject line on the open rate. Based on past data, you expect a small effect size of 0.3 (an increase from 20% to 23% in the open rate). You aim for an 80% chance (power = 0.8) of detecting this effect if it exists, with a 5% significance level (α = 0.05).

Calculate the required sample size per group using Python’s statsmodels library.
What sample size is needed for each group to ensure your test is properly powered?

In [3]:
from statsmodels.stats.power import TTestIndPower

effect_size = 0.3
alpha = 0.05
power = 0.8

analysis = TTestIndPower()

sample_size = analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power)
print(f"Required sample size per group: {int(sample_size)}")

Required sample size per group: 175


🌟  Exercise 2: Understanding the Relationship Between Effect Size and Sample Size
Using the same A/B test setup as in Exercise 1, you want to explore how changing the expected effect size impacts the required sample size.

Calculate the required sample size for the following effect sizes: 0.2, 0.4, and 0.5, keeping the significance level and power the same.
How does the sample size change as the effect size increases? Explain why this happens.


In [4]:
effect_sizes = [0.2, 0.4, 0.5]
for effect_size in effect_sizes:
    sample_size = analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power)
    print(f"Effect size: {effect_size}, Required sample size per group: {int(sample_size)}")

Effect size: 0.2, Required sample size per group: 393
Effect size: 0.4, Required sample size per group: 99
Effect size: 0.5, Required sample size per group: 63


With higher Es lower sample size is required as larger changes less likely to be attributed to flactuations

🌟 Exercise 3: Exploring the Impact of Statistical Power
Imagine you are conducting an A/B test where you expect a small effect size of 0.2. You initially plan for a power of 0.8 but wonder how increasing or decreasing the desired power level impacts the required sample size.

Calculate the required sample size for power levels of 0.7, 0.8, and 0.9, keeping the effect size at 0.2 and significance level at 0.05.
Question: How does the required sample size change with different levels of statistical power? Why is this understanding important when designing A/B tests?

In [5]:
powers = [0.7, 0.8, 0.9]
effect_size = 0.2
for power in powers:
    sample_size = analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power)
    print(f"Power: {power}, Required sample size per group: {int(sample_size)}")

Power: 0.7, Required sample size per group: 309
Power: 0.8, Required sample size per group: 393
Power: 0.9, Required sample size per group: 526


Higher power - higher sample size required.
I guess that it helps to better distribute usually limited resources on a test(to get samples) via balancing sample size that you get aquire and minimum power that you'd find sufficient


🌟 Exercise 4: Implementing Sequential Testing
You are running an A/B test on two versions of a product page to increase the purchase rate. You plan to monitor the results weekly and stop the test early if one version shows a significant improvement.

Define your stopping criteria.
Decide how you would implement sequential testing in this scenario.
At the end of week three, Version B has a p-value of 0.02. What would you do next?

Stopping criteria: p-value < 0.05.

Monitor results weekly, use cumulative data to calculate p-values.

Decision at week three:
    If the p-value is 0.02, stop the test as it meets the stopping criteria.

🌟  Exercise 5: Applying Bayesian A/B Testing
You’re testing a new feature in your app, and you want to use a Bayesian approach. Initially, you believe the new feature has a 50% chance of improving user engagement. After collecting data, your analysis suggests a 65% probability that the new feature is better.

Describe how you would set up your prior belief.
After collecting data, how does the updated belief (posterior distribution) influence your decision?
What would you do if the posterior probability was only 55%?

Prior belief: 50% chance the new feature will improve user engagement.
Update belief: 65% that new feature is better, therefore we can switch to it if our initial treshold allows it
If it was only 55%, we would need to collect more data

🌟 Exercise 6: Implementing Adaptive Experimentation
You’re running a test with three different website layouts to increase user engagement. Initially, each layout gets 33% of the traffic. After the first week, Layout C shows higher engagement.

Explain how you would adjust the traffic allocation after the first week.
Describe how you would continue to adapt the experiment in the following weeks.
What challenges might you face with adaptive experimentation, and how would you address them?

1. Allocate more traffic to the better-performing layout(C)
2. Optimising, as the test goes, in case other layouts show better/worse output, send/take traffic accordingly. If no changes were shown during next week then additionally increase traffic sent to layout C.
3. Early allocation changes might skew results, using other tests like Bayesian priot to changing traffic allocation might help
Different layout might confuse some users, should be considered in std
Upon taking too much traffic from layout it can impact factual results of the test, therefore needs to be set a minimum threshold for layout's traffic.
