# Stat Quickie: Thresholds for Significance

The aim of this notebook is to shed light on a fundamental question that often arises in the realm of statistics and data science, which is what constitutes a good threshold for significance. This is a question that is especially pertinent when it comes to interpreting the results of statistical tests. 

This notebook is inspired from a video tutorial and it elaborates on the ideas presented in the video. In statistical hypothesis testing, we often use a threshold (also known as alpha level) to determine whether the observed data are significantly different from what we would expect under the null hypothesis. The most common threshold value used is 0.05, but is this always the best choice? Let's delve into this question.

## The Origin of 0.05 Threshold

The 0.05 threshold for significance was proposed by the British statistician Ronald Fisher in the 1920s. This threshold is somewhat arbitrary and it is not based on a biological or natural law. The choice of 0.05 as a threshold means that if the null hypothesis is true, we would obtain the observed data (or more extreme) 5% of the time. This implies that if we reject the null hypothesis when the p-value is less than 0.05, we have a 5% risk of committing a Type I error (i.e., rejecting the null hypothesis when it is true). This level of risk is deemed acceptable in many fields of study.

## When to Use 0.05 Threshold?

For most scientific studies, especially those that are meant for publication, the 0.05 threshold is appropriate because it is widely accepted and expected by the scientific community. However, this doesn't mean that it should be used in all cases. Depending on the context, a smaller or larger threshold could be more suitable. 

For instance, in exploratory studies where you're trying to identify trends or generate hypotheses, a larger threshold could be used. This is because in such cases, the goal is not to make definitive conclusions, but rather to identify potential areas of interest to be investigated in more depth later on. Conversely, in cases where the stakes are high (e.g., clinical trials for a new drug), a smaller threshold would be more appropriate to minimize the risk of Type I errors. 

## Effect Size Matters

Another important point to consider when choosing a threshold for significance is the effect size. Effect size is a measure of the strength of the relationship between two variables. A small p-value doesn't necessarily mean that there is a large effect. For example, in a large study, a very small difference between two groups could be statistically significant but not practically significant. In such cases, we should also look at the effect size to determine if the difference is of practical significance. In other words, just having a small p-value (e.g., less than 0.05) is not enough; we should also have a substantial effect size.

## Extraordinary Claims Require Extraordinary Evidence

If you're making an extraordinary claim, you need to provide extraordinary evidence to back up your claim. In terms of statistical evidence, this means that you need a very small p-value. For instance, if you're claiming that there are extraterrestrials flying around New York City, a p-value of 0.05 is not going to be convincing. You would need a much smaller p-value to convince people of such an extraordinary claim.

In conclusion, while the 0.05 threshold for significance is commonly used and generally acceptable, it is not universally applicable. It's important to consider the context of the study and the potential consequences of Type I errors when choosing a threshold for significance. Additionally, a small p-value alone is not enough; a substantial effect size is also important. Lastly, extraordinary claims require extraordinary evidence, meaning very small p-values.

## References

1. Fisher, R.A. (1925). Statistical methods for research workers. Oliver & Boyd.
2. Sullivan, G. M., & Feinn, R. (2012). Using effect size—or why the p value is not enough. Journal of graduate medical education, 4(3), 279-282.
3. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: context, process, and purpose. The American Statistician, 70(2), 129-133.