# PP422 - Class 08 Power Analysis

In [1]:
# Loading initial packages
import numpy as np
from scipy.stats import norm

Letâ€™s return to the example from last class about you and your mum disagreeing about the current support for Labour. Your mom thinks the support is 40%, but you think it is potentially higher at 45%. Suppose you intend to survey 100 people to see if you can collect evidence to convince your mum. What is the power of this design?

In [2]:
# Specify initial parameters
p_mum = 0.4
p_you = 0.45
n_samp = 100

# Distribution Under the Null Hypothesis

In this case, we can treat your mum's value as the null hypothesis that we wish to reject. Based on the handout formulas, we can calculate the standard error of our sampling distribution under the null.

In [3]:
# Calculate the initial standard error of the sampling distribution under the null hypothesis 
se_mum = np.sqrt((p_mum * (1-p_mum) / n_samp))
print(se_mum)

0.04898979485566356


And with this standard error, we can calculate the rejection region for the null hypothesis. If we use a significance level of $\alpha = 0.05$, this corresponds to at least $\pm 1.96$ standard errors away from the mean.

In [4]:
# Identify the critical region under the null hypothesis
print(p_mum - 1.96*se_mum, p_mum + 1.96*se_mum)

0.30398000208289944 0.4960199979171006


## __Question:__ What are the above values telling us?

What the above calculation tells us is that in order to reject the null hypothesis of $p = 0.40$, given our sample size we would need to have an observed proportion of Labour support in our sample of less than $30.4\%$ or greater than $49.6\%$. 


__Next__, to calculate power, we ask how likely it would be, under our alternative hypothesized effect size to obtain a sample where we would successfully reject this null hypothesis.

# Distribution Under Some Assumed Alternative Effect or Treatment Effect

The above information draws upon knowledge of a sampling distribution _assuming the null_. In a power analysis, we __supplement__ this with another assumed distribution, namely the distribution under some alternative hypothesized effect size. As before, when dealing with proportions, if we assume a specific value for $p$, we can estimate the standard error of our sampling distribution with the familiar formula.

In [5]:
# Calculate the standard error under your hypothesized effect size of 45%
se_you = np.sqrt((p_you * (1-p_you))/n_samp)
print(se_you)

0.049749371855331


We can now figure out how likely it is that a draw __from the alternative distribution__ would fall in the rejection region of the __null distribution__.

In [6]:
p_mum - 1.96*se_mum

np.float64(0.30398000208289944)

In [7]:
# Calculating the lower value of the probability
norm.cdf(p_mum - 1.96*se_mum, loc=p_you, scale=se_you)

np.float64(0.0016671344555012275)

In [8]:
p_mum + 1.96*se_mum

np.float64(0.4960199979171006)

In [9]:
# Calculating the upper value of the probability
1-norm.cdf(p_mum + 1.96*se_mum, loc=p_you, scale=se_you)

np.float64(0.17747339234446224)

We can also calculate this upper tail equivalently with the `.sf()` method.

In [10]:
norm.sf(p_mum + 1.96*se_mum, loc=p_you, scale=se_you)

np.float64(0.1774733923444622)

Total power will be the sum of the probabilities across these two rejection regions.

In [11]:
# Calculating total power
norm.cdf(p_mum - 1.96*se_mum, loc=p_you, scale=se_you) + 1-norm.cdf(p_mum + 1.96*se_mum, loc=p_you, scale=se_you)

np.float64(0.1791405267999635)

# Bringing everything together into one calculation

In [12]:
# Key parameters
p_mum = 0.4
p_you = 0.45
n_samp = 1000

# SE calculations
se_mum = np.sqrt((p_mum * (1-p_mum) / n_samp))
se_you = np.sqrt((p_you * (1-p_you))/n_samp)

# Rejection region thresholds
reject_lower = p_mum - 1.96*se_mum
reject_upper = p_mum + 1.96*se_mum

# Final power calculation
norm.cdf(reject_lower, loc=p_you, scale=se_you) + (1 - norm.cdf(reject_upper, loc=p_you, scale=se_you))

np.float64(0.8940091776612397)