# 🌟 Exercise 1: Calculating Required Sample Size

You are planning an A/B test to evaluate the impact of a new email subject line on the open rate. Based on past data, you expect a small effect size of 0.3 (an increase from 20% to 23% in the open rate). You aim for an 80% chance (power = 0.8) of detecting this effect if it exists, with a 5% significance level (α = 0.05).

    Calculate the required sample size per group using Python’s statsmodels library.
    What sample size is needed for each group to ensure your test is properly powered?


In [4]:
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

# Setting the parameters
alpha = 0.05  # Level of significance
power = 0.8   # Power of the test
p1 = 0.20     # Open rate actual (control group)
p2 = 0.23     # Open rate expected (test group)

# Calculate the Cohen's h effect size
effect_size = proportion_effectsize(p1, p2)

# Calculate the required sample size per group
sample_size = NormalIndPower().solve_power(effect_size, power=power, alpha=alpha, ratio=1, alternative="two-sided")

# Display the result
print(f"Sample size required per group: {round(sample_size)}")

Sample size required per group: 2941


# 🌟 Exercise 2: Understanding the Relationship Between Effect Size and Sample Size

Using the same A/B test setup as in Exercise 1, you want to explore how changing the expected effect size impacts the required sample size.

    Calculate the required sample size for the following effect sizes: 0.2, 0.4, and 0.5, keeping the significance level and power the same.
    How does the sample size change as the effect size increases? Explain why this happens.


In [None]:
from statsmodels.stats.power import TTestIndPower


# List of effect sizes (Cohen's h)
effect_sizes = [0.2, 0.4, 0.5]


sample_sizes = {}
for effect_size in effect_sizes:
    sample_size = NormalIndPower().solve_power(effect_size, power=power, alpha=alpha, ratio=1, alternative="two-sided")
    sample_sizes[effect_size] = round(sample_size)


for effect_size, size in sample_sizes.items():
    print(f"Sample size needed for an effect of {effect_size}: {size}")

Sample size needed for an effect of 0.2: 392
Sample size needed for an effect of 0.4: 98
Sample size needed for an effect of 0.5: 63


# 🌟 Exercise 3: Exploring the Impact of Statistical Power

Imagine you are conducting an A/B test where you expect a small effect size of 0.2. You initially plan for a power of 0.8 but wonder how increasing or decreasing the desired power level impacts the required sample size.

    Calculate the required sample size for power levels of 0.7, 0.8, and 0.9, keeping the effect size at 0.2 and significance level at 0.05.
    Question: How does the required sample size change with different levels of statistical power? Why is this understanding important when designing A/B tests?
'''
As the power level increases, the required sample size also increases'''

In [None]:
from statsmodels.stats.power import TTestIndPower

# Define parameters
effect_size = 0.2  # Small effect size
alpha = 0.05       # Significance level
power_levels = [0.7, 0.8, 0.9]  # Different power levels to evaluate


analysis = TTestIndPower()

sample_sizes = {}
for power in power_levels:
    sample_size = analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power, alternative='two-sided')
    sample_sizes[power] = int(sample_size)


for power, sample_size in sample_sizes.items():
    print(f"Power: {power}, Required Sample Size per Group: {sample_size}")

Power: 0.7, Required Sample Size per Group: 309
Power: 0.8, Required Sample Size per Group: 393
Power: 0.9, Required Sample Size per Group: 526


# 🌟 Exercise 4: Implementing Sequential Testing

I'll stop the test early if one version clearly outperforms the other with a p-value below 0.05, but only if I've collected enough data to be confident in the results.

Before making any decisions,I need to reach a minimum sample size per group, based on a power analysis, to ensure the results are reliable.

It's not just about statistical significance—I also want to see a meaningful improvement in conversion rate (at least 5% increase) to justify making a change.

Finally, if I do see a significant difference, I’ll wait at least two weeks to make sure the trend holds before making a final call.

# 🌟 Exercise 5: Applying Bayesian A/B Testing

**Describe how you would set up your prior belief.**
At first, I'll think the new feature has a 50% chance of making the app better, like flipping a coin. So,I can start with an even guess — it could be better, or it might not be.

**After collecting data, how does the updated belief (posterior distribution) influence your decision?**
After I get some results from testing, I update my guess based on the new information. If the test shows a 65% chance that the feature is better, I feel a little more confident that it helps, but i'm still not 100% sure.

If I'm comfortable with taking chances, I might start using it. But if I want to be super sure, I might wait and get more results to feel safer abou tmy decision.

**What would you do if the posterior probability was only 55%?**
It’s better, but just a tiny bit : check more results, look for bigger changes,try againa

# 🌟 Exercise 6: Implementing Adaptive Experimentation

**Explain how you would adjust the traffic allocation after the first week**
Since Layout C shows higher engagement after the first week, I can shift more traffic to Layout C to gather more data for it. For example, I might adjust the traffic allocation to 50% for Layout C and 25% each for Layout A and B. This allows me to focus more on the layout that's performing better, while still keeping some traffic for the other layouts to confirm the results.

**Describe how you would continue to adapt the experiment in the following weeks.**
I will continue monitoring the engagement for all three layouts. If Layout C’s lead remains steady, I can keep increasing its traffic share.
As I gain more confidence in Layout C's performance, I might allocate even more traffic to it while reducing the share for the other layouts, ensuring a larger sample size for the winning layout.If Layout A or B starts showing better results, I can adjust the traffic again or consider ending the experiment if Layout C continues to perform significantly better.

**What challenges might you face with adaptive experimentation, and how would you address them?**
 Changing the traffic distribution too early can introduce bias, especially if I stop the test too soon. To mitigate this, I would use statistical methods like Bayesian analysis to avoid making premature decisions without sufficient data.

 If I focus too much on the early data, I might become overly confident in Layout C, even if the early results are just a fluke. To prevent this, I will run the experiment for an appropriate duration, ensuring I account for any randomness and validate the results.




