# Hypothesis and Inference

*Null hypothesis, H<sub>0</sub>*

*type 1 error* (“false positive”)

*type 2 error* (“false negative”)

*significance* of a test, how willing we are to make a *type 1 error* (“false positive”), in which we reject H<sub>0</sub> even though it’s true.

*power* of a test, which is the probability of not making a *type 2 error*, in which we fail to reject H<sub>0</sub> even though it’s false.

In [2]:
# imports, Python core
import math
from typing import Tuple

# imports, this project
from ch07_hypothesis_and_inference import normal_approximation_to_binomial
from ch07_hypothesis_and_inference import normal_probability_between
from ch07_hypothesis_and_inference import normal_two_sided_bounds

## Statistical Hypothesis Testing

Test whether a certain hypothesis is likely to be true.

In the classical setup, we have a ***null hypothesis*** *H<sub>0</sub>* that represents some default position, and some alternative hypothesis *H<sub>1</sub>* that we’d like to compare it with. We use statistics to decide whether we can reject *H<sub>0</sub>* as false or not.

## Example: Flipping a Coin

Imagine we have a coin and we want to test whether it’s fair. We’ll make the assumption that the coin has some probability *p* of landing heads, and so our null hypothesis is that the coin is fair, that is, that *p* = 0.5. We’ll test this against the alternative hypothesis *p* ≠ 0.5.

***
***Default position/null hypothesis, H<sub>0</sub>: *p* = 0.5***

***Alternative position, H<sub>1</sub>: *p* ≠ 0.5***
***

Our test will involve flipping the coin some number n times and counting the number of heads X.

In particular, let’s say that we choose to flip the coin n = 1000 times. If our hypothesis of fairness is true, X should be distributed approximately normally with mean 500 and standard deviation 15.8:

In [3]:
# In particular, let’s say that we choose to flip the coin n = 1000 times. If our hypothesis
# of fairness is true, X should be distributed approximately normally with mean 500 and
# standard deviation 15.8:

mu_0, sigma_0 = normal_approximation_to_binomial(1000, 0.5)     # (500, 15.8)
print('flipping a coin 1000 times, with a p=0.5 results in')
print('mean of:', round(mu_0, 2))
print('sigma/standard deviation of:', round(sigma_0, 2))

flipping a coin 1000 times, with a p=0.5 results in
mean of: 500.0
sigma/standard deviation of: 15.81


### type-1 error, (“false positive”); significance

We need to make a decision about ***significance***, how willing we are to make a *type 1 error* (“false positive”), in which we reject *H<sub>0</sub>* even though it’s true. For reasons lost to # the annals of history, this willingness is often set at 5% or 1%. Let’s choose 5%.

Assuming p really equals 0.5 (i.e., *H<sub>0</sub>* is true), there is just a 5% chance we observe an X that lies outside this interval, which is the exact significance we wanted. Said differently, if *H<sub>0</sub>* is true, then, approximately 19 times out of 20, this test will give the correct result.

In [5]:
# Consider the test that rejects H0 if X falls outside the bounds given by:

lower, uppper = normal_two_sided_bounds(0.95, mu_0, sigma_0)    # (469, 531)
print('We will reject H0 if X falls outside the bounds:', round(lower, 2), '< X <',round(uppper, 2))

We will reject H0 if X falls outside the bounds: 469.01 < X < 530.99


### type 2 error (“false negative”), in which we fail to reject *H<sub>0</sub>* even though it’s false; power

We are also often interested in the ***power*** of a test, which is the probability of not making a *type 2 error* ("false negative"), in which we fail to reject *H<sub>0</sub>* even though it’s false. In order to measure this, we have to specify what exactly *H<sub>0</sub>* being false means. (Knowing merely that p is not 0.5 doesn’t give you a ton of information about the distribution of X). In particular, let’s check what happens if p is really 0.55, so that the coin is slightly biased toward heads.

In that case, we can calculate the power of the test with:

In [7]:

# 95% bounds based on assumption p is 0.5
lo, hi = normal_two_sided_bounds(0.95, mu_0, sigma_0)
print("high and low 95% bounds for normal distribution", hi, "and", lo)

# actual mu and sigma based on p = 0.55
mu_1, sigma_1 = normal_approximation_to_binomial(1000, 0.55)

# a type 2 error means we fail to reject the null hypothesis
# which will happen when X is still in our original interval
type_2_probability = normal_probability_between(lo, hi, mu_1, sigma_1)
power = 1 - type_2_probability      # 0.887
print("power is:", round(power, 3))

NameError: name 'low' is not defined

## Confidence Intervals