# Statistical Power

Statistical power, in simple terms, is the likelihood of a hypothesis test detecting an effect if that effect truly exists.
- It plays a crucial role in the reliability and confidence of study results. This concept can also help us determine the sample size needed to detect an effect in an experiment.
- In this tutorial, we'll explore the significance of statistical power in hypothesis testing and learn how to perform power analyses and create power curves for experimental design.

By the end of this tutorial, you will gain insights into the following key points:

- **Statistical Power**: It represents the probability of a hypothesis test revealing an effect if there is indeed an effect to be discovered.

- **Power Analysis**: You can use power analysis to estimate the minimum sample size required for an experiment, considering factors like your desired significance level, effect size, and the statistical power you aim to achieve.

- **Calculating and Visualizing Power**: We'll delve into the practical aspects of calculating and visualizing power analysis for the Student’s t-test using Python. This knowledge will empower you to design experiments more effectively.


---


## Statistical Hypothesis Testing

- Statistical hypothesis testing is a method that involves making an assumption about the outcome, known as the null hypothesis.
  - For instance, in a Pearson’s correlation test, the null hypothesis asserts that there's no connection between two variables.

  - In the case of the Student’s t-test, the null hypothesis suggests there's no distinction between the means of two populations.

- To interpret the results of a hypothesis test, we often use a p-value, which represents the probability of observing a result as extreme as what we found in our data.
  - It's essential to understand that the p-value tells us the likelihood of our observation given that the null hypothesis is accurate, not the other way around (a common misconception).

- **p-value (p)**: This is the probability of obtaining a result that is equal to or more extreme than what we observed in the data.

  - In the interpretation of p-values, you must choose a significance level, often represented by the Greek lowercase letter alpha (α).

  - A typical value for the significance level is 5%, denoted as 0.05. The p-value's relevance depends on the chosen significance level.

  - If the p-value is less than the significance level, you can claim that the result is statistically significant, leading to the rejection of the null hypothesis.

    - **p-value ≤ alpha**: This indicates a significant result. You reject the null hypothesis, suggesting that the distributions differ (H1).

    - **p-value > alpha**: This indicates a non-significant result, meaning you fail to reject the null hypothesis, implying that the distributions are the same (H0).

- In this context, the significance level (alpha) can be seen as the probability of rejecting the null hypothesis when it's true, or the likelihood of making a Type I Error, which is a false positive.
  - The p-value is the lens through which we view the likelihood of these outcomes.

- When dealing with p-values, it's important to remember that errors can occur:

  - **Type I Error**: This happens when you incorrectly reject the null hypothesis, even when there's no significant effect (a false positive). The p-value is excessively small, creating an optimistic scenario.

  - **Type II Error**: This occurs when you fail to reject the null hypothesis despite the presence of a significant effect (a false negative). The p-value is unreasonably large, leading to a pessimistic situation.


---


## What Is Statistical Power?

- Statistical power, or the power of a hypothesis test, is a measure of the test's ability to correctly reject the null hypothesis.
  - In simpler terms, it's the probability of correctly identifying a true effect, known as a true positive result.
  - Statistical power becomes meaningful when the null hypothesis is indeed rejected.

- The key concept is that statistical power pertains to the ability of a test to accurately reject a false null hypothesis. In other words, statistical power only matters when the null hypothesis is false.

- The higher the statistical power in an experiment, the lower the chances of making a Type II error, also called a false negative.
  - In practical terms, this means a higher likelihood of detecting an effect when it's genuinely present.
  - In fact, statistical power is precisely the inverse of the probability of making a Type II error.

- Mathematically, this relationship can be expressed as:

  ```
  Power = 1 - Type II Error
  Pr(True Positive) = 1 - Pr(False Negative)
  ```

- In simpler terms, statistical power can be thought of as the probability of accepting an alternative hypothesis when the alternative hypothesis is true.
  - When evaluating statistical power, we aim for experimental setups that have high power.

  - **Low Statistical Power**: This implies a high risk of committing Type II errors, such as false negatives.

  - **High Statistical Power**: This signifies a low risk of making Type II errors.

- It's crucial to understand that experimental results with insufficient statistical power can lead to invalid conclusions about the significance of the findings.
  - Therefore, researchers often design experiments with a minimum statistical power of 80% or better (e.g., 0.80). This means there is a 20% chance of encountering a Type II error, which is notably different from the 5% likelihood of encountering a Type I error, the standard significance level.


---


## Understanding Power Analysis

- Power analysis is a critical component of experimental design, involving four interconnected factors:

  - **Effect Size**: The measure of the magnitude of a population's effect. It's quantified using specific statistical methods like Pearson's correlation coefficient or Cohen's d for group differences.

  - **Sample Size**: The number of observations in your sample.

  - **Significance**: The chosen significance level (often 5% or 0.05) for your statistical test.

  - **Statistical Power**: The probability of correctly accepting the alternative hypothesis if it's true.

- These four variables are closely linked.
  - For instance, a larger sample size makes it easier to detect an effect, and you can increase statistical power by raising the significance level.
  - Power analysis involves estimating one of these factors based on known values of the other three, making it a crucial tool for designing and analyzing experiments that rely on statistical hypothesis tests.

- One common application of power analysis is to determine the minimum sample size needed for an experiment.
  - Typically, power analyses are conducted before a study begins, and they are often used to estimate sample sizes.

- To get started, you can begin with sensible defaults for some parameters, such as a significance level of 0.05 and a power level of 0.80.
  - You can then estimate the minimum effect size specific to your experiment and use power analysis to calculate the required sample size.

- Furthermore, you can run multiple power analyses to create curves or plots showing the relationships between parameters.

  - For example, you can explore how changes in sample size impact the effect size. This flexible approach is a valuable tool for experimental design and ensures robust research outcomes.


---




## Student’s t-Test Power Analysis

Let's dive into the concept of statistical power and power analysis using a practical example.

- In this section, we'll focus on the Student’s t-test, a statistical hypothesis test used to compare the means of two samples of Gaussian variables.
  - The null hypothesis for this test assumes that both sample populations have the same mean, meaning there's no difference between the samples.

- The test provides a p-value, which we can interpret to decide whether the samples are the same (failing to reject the null hypothesis) or if there's a statistically significant difference between them (rejecting the null hypothesis).
  - A common significance level for interpreting the p-value is 5% (0.05).

    - **Significance Level (alpha)**: 5% or 0.05.

- To quantify the size of the difference between two groups, we use an effect size measure.
  - Cohen’s d is a common choice, representing the difference in means in terms of standard deviations. A large effect size for Cohen’s d is typically 0.80 or higher.

    - **Effect Size**: Cohen’s d of at least 0.80.

- Let's work with a default of a minimum statistical power of 80% or 0.80.

  - **Statistical Power**: 80% or 0.80.

- For our specific experiment with these defaults, we aim to estimate a suitable sample size.
  - In other words, we want to determine how many observations are needed in each sample to detect an effect of 0.80 with an 80% chance of detecting the effect if it's true (avoiding a Type II error) and a 5% chance of detecting an effect when there's none (Type I error). We can use power analysis for this purpose.

- We'll use the Statsmodels library's TTestIndPower class to calculate a power analysis for the Student’s t-test with independent samples.
  - In our case, we want to estimate the sample size.
  - We'll provide the known values of alpha, effect, and power, and set the argument for sample size (nobs1) to None to instruct the function to calculate it.

  ```python
  # estimate sample size via power analysis
  from statsmodels.stats.power import TTestIndPower

  # parameters for power analysis
  effect = 0.8
  alpha = 0.05
  power = 0.8

  # perform power analysis
  analysis = TTestIndPower()
  result = analysis.solve_power(effect, power=power, nobs1=None, ratio=1.0, alpha=alpha)
  print('Sample Size: %.3f' % result)
  ```

- Running this example will calculate and print the suggested minimum sample size needed for your experiment. In the example, it's approximately 25 samples.

- We can further explore by creating power curves, which illustrate how changes in variables, such as effect size and sample size, affect the power of the statistical test.
  - The Statsmodels library provides a convenient function, `plot_power()`, to generate these curves. You can customize the curves by varying the parameters. Here's an example:

  ```python
  # calculate power curves for varying sample and effect size
  from numpy import array
  from matplotlib import pyplot
  from statsmodels.stats.power import TTestIndPower

  # parameters for power analysis
  effect_sizes = array([0.2, 0.5, 0.8])
  sample_sizes = array(range(5, 100))

  # calculate power curves from multiple power analyses
  analysis = TTestIndPower()
  analysis.plot_power(dep_var='nobs', nobs=sample_sizes, effect_size=effect_sizes)
  pyplot.show()
  ```

- Running this code will create power curves that show how the statistical power changes with different sample sizes and effect sizes. You can visualize the point of diminishing returns in terms of statistical power for various scenarios.


---


## Further Reading

### Papers

- [Using Effect Size—or Why the P Value Is Not Enough, 2012](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/)

### Books

- [The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results, 2010](https://amzn.to/2JDcwSe)
- [Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis, 2011](https://amzn.to/2v0wKSI)
- [Statistical Power Analysis for the Behavioral Sciences, 1988](https://amzn.to/2GNcmtu)
- [Applied Power Analysis for the Behavioral Sciences, 2010](https://amzn.to/2GPS3vI)

### API Documentation

- [Statsmodels Power and Sample Size Calculations](http://www.statsmodels.org/dev/stats.html#power-and-sample-size-calculations)
- [statsmodels.stats.power.TTestPower API](http://www.statsmodels.org/dev/generated/statsmodels.stats.power.TTestPower.html)
- [statsmodels.stats.power.TTestIndPower API](http://www.statsmodels.org/dev/generated/statsmodels.stats.power.TTestIndPower.html)
- [statsmodels.stats.power.TTestIndPower.solve power API](http://www.statsmodels.org/dev/generated/statsmodels.stats.power.TTestIndPower.solve_power.html)
- [statsmodels.stats.power.TTestIndPower.plot power API](http://www.statsmodels.org/dev/generated/statsmodels.stats.power.TTestIndPower.plot_power.html)

### Articles

- [Statistical power on Wikipedia](https://en.wikipedia.org/wiki/Statistical_power)
- [Statistical hypothesis testing on Wikipedia](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing)
- [Statistical significance on Wikipedia](https://en.wikipedia.org/wiki/Statistical_significance)
- [Sample size determination on Wikipedia](https://en.wikipedia.org/wiki/Sample_size_determination)
- [Effect size on Wikipedia](https://en.wikipedia.org/wiki/Effect_size)
- [Type I and type II errors on Wikipedia](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors)

---
