# Statistical Testing
## Power

1. Estimates  
2. Sampling
3. **Power: Significance Level, Power, & Power Analysis**
4. Statistical Tests

## Significance Level, $\alpha$

The significance level is used to interpret the p-value of a significance test.  
You can think of the significance level as *the probability of rejecting the null hypothesis if it were true.*  

- It is identified by $\alpha$, and is commonly set to a value of .05.   
- If the p-value <= $\alpha$, we reject the null hypothesis, H0  
- If the p-value > $\alpha$, we fail to reject the null hypothesis, H0   
- The results is considered “statistically significant” if the p-value is less than $\alpha$, the significance level.   

## Statistical Power

https://machinelearningmastery.com/statistical-power-and-power-analysis-in-python/

Statistical power is the probability of a true positive result.  It is only useful when the null hypothesis is rejected.

*statistical power is the probability that a test will correctly reject a false null hypothesis. Statistical power has relevance only when the null is false.*  — Page 60, The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results, 2010.

The higher the statistical power:
    - the lower the probability of making a Type II (false negative) error. 1-Power = p(typeII)
    - The higher the probability of detecting an effect when there is an effect.  

`Power = 1 - Type II Error
Pr(True Positive) = 1 - Pr(False Negative)`

Experimental results with too low statistical power will lead to invalid conclusions about the meaning of the results. Therefore a minimum level of statistical power must be sought.

Common value: power = .8 (80%) or better


### Summarizing Significance Level & Power

- Statistical power of .80 = 20% probability of encountering a Type II error, false negative, not detecting a significant effect when there actually is one.
- significance level of .05 = 5% probability of encountering a Type I error, false positive, detecting a significant effect when there is not one. 
- Low Statistical Power: Large risk of committing Type II errors, e.g. a false negative.
- High Statistical Power: Small risk of committing Type II errors.
- Low Significance Level: Small risk of committing Type I Errors
- High Significance Level: Large risk of committing Type I Errors

## Power Analysis

### Description
Power analysis is a process for estimating one of the following four parameters given values for three other parameters.  It is normally run before a study is conducted.

1. **Effect Size:** *The quantified magnitude of a result present in the population.* Effect size is calculated using a specific statistical measure, such as Pearson’s correlation coefficient for the relationship between variables or Cohen’s d for the difference between groups.
2. **Sample Size:** The number of observations in the sample.
3. **Significance Level:** $\alpha$, common value =  5% or 0.05.
4. **Statistical Power:** The probability of accepting the alternative hypothesis if it is true, common value = 80% or .80

*Power analysis qnswers questions like “how much statistical power does my study have?” and “how big a sample size do I need?”.*  — Page 56, The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results, 2010.

Example: The statistical power can be estimated given an effect size, sample size and significance level. 
Example: The estimation of the minimum sample size required for an experiment, given different desired levels of significance.

### Steps
1. Start with sensible defaults for some parameters, such as a significance level of 0.05 and a power level of 0.80. 
2. Estimate a desirable minimum effect size, specific to the experiment being performed. 
3. Use power analysis can then be used to estimate the minimum sample size required.


The statsmodels.stats.power module currently implements power and sample size calculations for the t-tests, normal based test, F-tests and Chisquare goodness of fit test. The implementation is class based, but the module also provides three shortcut functions, tt_solve_power, tt_ind_solve_power and zt_ind_solve_power to solve for any one of the parameters of the power equations.

TTestIndPower()	Statistical Power calculations for t-test for two independent sample
TTestPower()	Statistical Power calculations for one sample or paired sample t-test
GofChisquarePower()	Statistical Power calculations for one sample chisquare test
NormalIndPower()	Statistical Power calculations for z-test for two independent samples.
FTestAnovaPower()	Statistical Power calculations F-test for one factor balanced ANOVA
FTestPower()	Statistical Power calculations for generic F-test
tt_solve_power	solve for any one parameter of the power of a one sample t-test
tt_ind_solve_power	solve for any one parameter of the power of a two sample t-test
zt_ind_solve_power	solve for any one parameter of the power of a two sample z-test


### Example: Student's  t-Test Power Analysis

Student’s t test: used to compare the means from two samples of Gaussian variables. 
The assumption, or null hypothesis: the sample populations have the same mean, e.g. that there is no difference between the samples or that the samples are drawn from the same underlying population.

The test will calculate a p-value that can be interpreted as to whether the samples are the same (fail to reject the null hypothesis), or there is a statistically significant difference between the samples (reject the null hypothesis). 

Significance level (alpha): 5% or 0.05.
Effect size:  Cohen’s d of at least 0.80.  Cohen's d is a common measure for comparing the difference in the mean from two groups. It's standard score describes the difference in terms of the number of standard deviations that the means are different. 
Statistical power: 80% or 0.8.

Now we will want to estimate a suitable sample size, i.e. how many observations are required from each sample in order to at least detect an effect of 0.80 with an 80% chance of detecting the effect if it is true (20% of a Type II error) and a 5% chance of detecting an effect if there is no such effect (Type I error).

We solve this using a power analysis

#### power analysis

statsmodels.stats.power.TTestIndPower: used for calculating a power analysis for the Student’s t test with independent samples. 
(TTestPower is used for the same analysis for the paired Student’s t test.)    
The TTestIndPower instance must be created, then we can call the solve_power() with our arguments to estimate the sample size for the experiment.  
solve_power(): used to calculate one of the four parameters in a power analysis, e.g. the sample size. We provide the three pieces of information we know (alpha, effect, and power) and set the size of argument we wish to calculate the answer of (nobs1) to “None“. This tells the function what to calculate.  

*A note on sample size: the function has an argument called ratio that is the ratio of the number of samples in one sample to the other. If both samples are expected to have the same number of observations, then the ratio is 1.0. If, for example, the second sample is expected to have half as many observations, then the ratio would be 0.5.*

analysis = TTestIndPower()
result = analysis.solve_power(effect, power=power, nobs1=None, ratio=1.0, alpha=alpha)

estimate sample size via power analysis:  

`from statsmodels.stats.power import TTestIndPower
effect = 0.8
alpha = 0.05
power = 0.8
analysis = TTestIndPower()
result = analysis.solve_power(effect, power=power, nobs1=None, ratio=1.0, alpha=alpha)
print('Sample Size: %.3f' % result)`

Running the example calculates and prints the estimated number of samples for the experiment as 25. This would be a suggested minimum number of samples required to see an effect of the desired size.

#### power curves

Power curves are line plots that show how the change in variables, such as effect size and sample size, impact the power of the statistical test.

The plot_power() function can be used to create power curves. The dependent variable (x-axis) must be specified by name in the ‘dep_var‘ argument. Arrays of values can then be specified for the sample size (nobs), effect size (effect_size), and significance (alpha) parameters. One or multiple curves will then be plotted showing the impact on statistical power.

For example, we can assume a significance of 0.05 (the default for the function) and explore the change in sample size between 5 and 100 with low, medium, and high effect sizes.

Calculate power curves for varying sample and effect size:  

`from numpy import array
from matplotlib import pyplot
from statsmodels.stats.power import TTestIndPower
effect_sizes = array([0.2, 0.5, 0.8])
sample_sizes = array(range(5, 100))
analysis = TTestIndPower()
analysis.plot_power(dep_var='nobs', nobs=sample_sizes, effect_size=effect_sizes)
pyplot.show()`

Running the example creates the plot showing the impact on statistical power (y-axis) for three different effect sizes (es) as the sample size (x-axis) is increased.

We can see that if we are interested in a large effect that a point of diminishing returns in terms of statistical power occurs at around 40-to-50 observations.