## The Normal Distribution
### sigma, pvalues and one-tailed probability

A two-tailed hypothesis test is designed to show whether the measure is significantly greater than and significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show that the measure would be higher or lower than the population mean.

#### two-tailed probability in standard deviations

| $\sigma$ | Probability |
| ------- | ----------- |
| 1       | 0.683       |
| 2       | 0.9545      |
| 3       | 0.9973      |
| 4       | 0.9999367   |
| 5       | 0.99999943  |

#### one-tailed pvalues 
In example, 3 $\sigma$ two-tailed probability is 0.9973 which pvalue is given by the survival function: (1 - p) = 0.0027. This pvalue describes the probability of having a measure outside the $\pm 3 \sigma$. If one is interested in the one-tailed probability this number should be halved: (1 - p)/2 = 0.00135.


Let's see some example on how to convert between standard deviation, probability and pvalue.

First the imports.

In [1]:
import numpy as np
from scipy.stats import norm

Let's define a two-tailed probability of choice, i.e. p = 99.73% that we know corresponds to $3\sigma$. 

In [2]:
p = 0.9973

We want to know which is the one-tailed pvalue of that probability since our use cases are constructed on half of a symmetrical distribution. We define a function that given the probability returns its one-tailed pvalue. Beware of floating points and rounding errors.

In [7]:
def get_pval_from_prob(p, decimals=8):
    return np.round((1-p)/2, decimals)

And we find that the pvalue of p = 99.73% is:

In [8]:
get_pval_from_prob(p)

0.00135

Now we define a function that given the pvalue returns the probability, to close the circle. We will simply have to invert the operation done above, remembering to double the pvalue:

In [9]:
def get_prob_from_pvalue(pval, decimals=8):
    return np.round(1-pval*2, decimals)

Hence, using the previously found pvalue as input we find the same probability as before:

In [11]:
pval = 0.00135
get_prob_from_pvalue(pval)

0.9973

What really interest us is how to find the probability of a measure laying outside a given standard deviation threshold. This can be achived by calculating the survival function of that $\sigma$ threshold. Let's define another function doing that.

In [12]:
def get_prob_from_sigma(sigma, decimals=8):
    return np.round(1-(norm.sf(sigma)*2), decimals)

And let's verify that we retrive the correct probability for $3\sigma$, which we know it must be p = 99.73%.

In [19]:
sigma = 3
get_prob_from_sigma(sigma, decimals=5)

0.9973

Now, since we actually want to convert standard deviation thresholds directly into pvalues, we can combine the two previously defined functions as such:

In [14]:
def get_pvalue_from_sigma(sigma, decimals=8):
    p = get_prob_from_sigma(sigma, decimals=decimals)
    return get_pval_from_prob(p, decimals=decimals)

And find that the one-tailed pvalue of $3\sigma$ is once again:

In [16]:
get_pvalue_from_sigma(sigma, decimals=5)

0.00135

This process can be, of course, done reversedly. One must find the percent point function (or inverse cumulative distribution function) relative to that pvalue. Remember that since the normal distribution is symmetrical, you want to take in absolute value the number of standard deviations rather than $\pm n \sigma$.

In [17]:
def get_sigma_from_pvalue(pval, decimals=3):
    return np.abs(np.round(norm.ppf(pval), decimals))

We can verify that from a pvalue of 0.00135 we once again obtain $3\sigma$.

In [18]:
get_sigma_from_pvalue(pval)

3.0