# Hypothesis Testing

## Hypothesis Testing Framework

* We start with a **null hypothesis (H0)** that represents the status quo.
* We also have an **alternative hypothesis (HA)** that represents our research question, i.e. what we're testing for.
* We conduct a hypothesis test under the assumption that the null hypothesis is true, either via
simulation or theorectical methods - methods that rely on the CLT.
* If the test results suggest that the data do not provide convincing evidence for the alternative
hypothesis, we stick with the null hypothesis. If they do, then we reject the null hypothesis in
favor of the alternative.

## Hypothesis Testing (for a Mean)

### Hypothesis
* null - H0 : Ofen either a skeptical perspective or a claim to be tested (=)
* alternative - HA : Represents an alternative claim under consideration and is often represented by a range of possible parameter values. (<, >, !=)
* The skeptic will not abandon the H0 unless the evidence in favor of the HA is so strong that she rejects H0 in favor of HA.
* Hypothesis is always about population parameters, but never about sample statistics.

### P-Value

* **P(observed or more extreme outcome | H0 true)**
* We use the test statistic to calculate the p-value, the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis was true.
* If the p-value is low (lower than the **significance level** $ \alpha $, which is usually 5%) we say that it would be very unlikely to observe the data if the null hypothesis were true, and hence **reject H0**.
* If the p-value is high (higher than $ \alpha $) we say that it is likely to observe the data even if the null
hypothesis were true, and hence **do not reject H0**.

**Example**
* Earlier we calculated a 95% confidence interval for the average number of exclusive relationships college students have been in to be 2.7 to 3.7. 
* Based on this confidence interval, do these data support the hypothesis that college students on average have been in more than three exclusive relationships?

$$
P(\bar{x} > 3.2 | H_0 : \mu = 3)
$$

In [21]:
# H0: mu = 3
# Collage students have been in 3 exclusive relationships, on average.

# HA: mu > 3
# Collage students have been in more than 3 exclusive relationships, on average.

# P(x_bar > 3.2 | H0: mu = 3)

# Since we assumme H0 is true, we can construct the sampling distribution based on the CLT.

# x_bar ~ N(mu = 3, se = 0.246)

# test statistic (z-score for normal distribution)
(z <- round((3.2 - 3)/0.246, 2))

# p-value
(p_value <- round(1 - pnorm(z), 2))

# Since p-value is high, we do not reject H0.

In [22]:
# If in fact college students have been in 3 exclusive relationships on average (H0 true), there is a 
# 21% (0.21) chance that a random sample of 50 college student would yield a sample mean of 3.2 or higher.

# This is a high probability, so we think that a sample mean of 3.2 or more exclusive relationships is 
# likely to happen simply by chance or sampling variability.

### Two-Sided Tests

* Often instead of looking for a divergence from the null in a specific direction, we might be interested
in divergence in any direction.
* We call such hypothesis tests **two-sided** (or **two-tailed**).
* The definition of a p-value is the same regardless of doing a one or a two-sided test. However, the calculation becomes slightly different and ever so slightly more complicated since we need to consider "at least as extreme as the observed outcome" in both directions away from the mean.

**Example**

$$
P(\bar{x} > 3.2  \text{ OR }  \bar{x} < 2.8 | H_0 : \mu = 3)
$$

In [40]:
# test statistic
(z_upper <- round((3.2 - 3)/0.246, 2))
(z_lower <- round((2.8 - 3)/0.246, 2))

# p-value
(p_upper <- round(1 - pnorm(z_upper), 3))
(p_lower <- round(pnorm(z_lower), 3))

(p_value <- p_upper + p_lower)

### Step-by-Step

1. Set the **hypotheses**
    * $H_0 : \mu = \text{null value}$
    * $H_A: \mu < or > or \neq \text{null value}$
2. Calculate the **point estimate**: $\bar{x}$
3. Check **conditions**
    * Independence
    * Sample size/skew
4. Draw **sampling distribution**, shape **p-value**, calculate **test statistic**
    * $ z = \frac{\bar{x} - \mu}{SE} $
    * $ SE = \frac{s}{\sqrt{n}} $
5. Make a decision, and interpret it in context of the research question
    * If p-value < $\alpha$, reject $H_0$; the data provide convincing evidence for $H_A$.
    * If p-value > $\alpha$, fail to reject $H_0$ the data do not provide convincing evidence for $H_A$.

**Example**
* Researchers investigating characteristics of gifted children collected data from schools in a large city on a random sample of 36 children who were identified as gifted children soon after they reached the age of four. 
* In this study, along with variables on the children, the researchers also collected data on their mothers' IQ scores. The histogram shows the distribution of these data, and also provided our some sample statistics.
    * n = 36
    * min = 101
    * mean = 118.2
    * sd = 6.5
    * max = 131
* Perform a hypothesis test to evaluate if these data provide convincing evidence of a difference between the average IQ score of mothers of gifted children And the average IQ score for the population at large, which happens to be 100. We're also asked to use a significance level of .01.

In [41]:
# (1) Set the hypotheses
# H0: mu = 100
# H1: mu != 100

In [42]:
# (2) Calculate the point estimate
x_bar <- 118.2

In [43]:
# (3) Check the conditions
# random & 35 < 10% of all gifted child -> independence
# n > 30 & sample not skewed -> nearly normal sampling distribution

In [63]:
# (4) Sampling distribution, p-value, test statistic
(se <- round(6.5/sqrt(36), 3))

(z_upper <- (118.2-100)/se)
(z_lower <- (81.8-100)/se)

(p_value <- (pnorm(z_upper, lower.tail = FALSE)) + pnorm(z_lower))

In [64]:
# (5) Make a decision
# p-value is very low -> strong evidence against the null
# We reject the null hypothesis and conclude that the data provide convincing evidence of a difference between 
# the average IQ score of mothers of gifted child and the average IQ score for the populatin at large.

**Example**
* A statistics student interested in sleep habits of domestic cats took a random sample of 144 cats and monitored their sleep. The cats slept an average of 16 hours per day. According to our online resources, domestic dogs actually sleep on average 14 hours a day. 
* We want to find out if these data provide convincing evidence of different sleeping habits for domestic cats and dogs with respect to how much they sleep. Note that the test statistic calculated was 1.73.
* What is the interpretation of this p-value in context of these data?

In [68]:
# H0: mu = 14
# H1: mu != 14

(p_value <- pnorm(-1.73) * 2)

In [70]:
# p(obtaining a random sample of 144 cats that sleep 16 hours or more or 12 hours or less, 
# on average, if in fact cats truly slept 14 hours per day on average) = 0.0836

# The p-value is larger than the significance level 0.05. Hence the evidence is not strong
# enough to reject the null hypotheses. Hence, we cannot decide that there is a
# difference between the sleeping habits for domestic cats and dogs.
# If the average hours of sleep for domestic cat is 14, there is a 8% chance 
# that a random sample of size 144 would yield a sample mean of 16.