# Significance

## Inference for Other Estimators

* Any **nearly normal sampling distributions**
    * Sample mean $\bar{x}$
    * Difference between sample means $\bar{x}_1 - \bar{x}_2$
    * Sample proportion $\hat{p}$
    * Diffefference between sample proportions $\hat{p}_1 - \hat{p}_2$
* **Unbiased estimator**
    * An important assumption about the point estimates is that they're unbiased, i.e. the sampling distribution of the estimate is centered at the true population parameter it estimates.
    * An unbiased estimate does not naturally over or underestimate the parameter but instead it provides a good estimate. 
    * We know that the sample mean is an example of an unbiased point estimate.
    
### Confidence Intervals
* **Confidence intervals** for nearly normal point estimates
    $$ \text{point estimate} \pm z^{\star} \times SE $$
    
### Hypothesis Testing
* **Hypothesis testing** for nearly normal point estimates
    $$ Z = \frac{\text{ point estimate } - \text{ null value }}{SE} $$

**Example**
* A 2010 Pew Research foundation poll indicates that among 1,099 college graduates, 33% watch the Daily Show. An American late-night TV Show. The standard error of this estimate is 0.014. 
* We are asked to estimate the 95% confidence interval for the proportion of college graduates who watch The Daily Show.

In [1]:
p <- 0.33
se <- 0.014

p + 1.96 * se
p - 1.96 * se

**Example**

* The 3rd national health and nutrition examination survey NHANES, collected body fat percentage and gender data from over 13,000 subjects in ages between 20 to 80. The average body fat percentage for the 6,580 men in the sample was 23.9%. And this value was 35% for the, for the 7,021 women. The standard error for the difference between the average male and female body fat percentages was 0.114. 
* Do these data provide convincing evidence that men and women have different average body fat percentages? You may assume that the distribution of the point estimate is nearly normal. 

In [11]:
# H0: mu_men = mu_women -> mu_men - mu_women = 0
# H1: mu_men = mu_women -> mu_men - mu_women != 0

# Point estimate
# mu_men - mu_women
(p <- 23.9 - 35)

mu <- 0
se <- 0.114

(z <- (-11.1 - 0)/0.114)

(p_value <- pnorm(z) * 2)

# Reject the null hypothesis.

## Decision Errors

* **Type I error** ($\alpha$) is rejecting the H0 when H0 is true. (Declaring the defendant guilty when they are actually innocent.)
* **Type II error** ($\beta$) is failing to reject H0 when HA is true. (Declaring the defendant innocent when they are actually guilty.)
* **Power** ($1 - \beta$) of a test is the probability of correctly rejecting H0.


* We (almost) never know if H0 or HA is true, but we need to consider all possibilities.

### Type I Error Rate
* We reject the null hypothesis when the p-value is less than 0.05 ($\alpha = 0.05$). 
* This means that, for those cases where the null hypothesis is actually true, we do not want to incorrectly reject it more than 5% of those times.
* In other words, when using a 5% significance level, there is about a 5% chance of making a type one error if the null hypothesis is true. 

$$ P(\text{Type I error } | H_0 \text{ true}) = \alpha $$

* This is why we prefer small values of $\alpha$ - increasing $\alpha$ increases the Type I error rate.
* If Type I Error is dangerous or especially costly, choose a small significance level (e.g. 0.01). Goal: We want to be very cautious about rejecting H0, so we demand very strong evidence favoring HA before we should do so.
* If Type II Error is relatively more dangerous or much more costly, choose a higher significance level (e.g. 0.10). Goal: We want to be cautious about failing to reject H0 when the null is actually false.

### Type II Error Rate

* If the alternative hypothesis is actually true, what is the chance that we make a type two error? In other words, what is the chance that we fail to reject the null hypothesis, even when we should reject it?
* The answer to this is not obvious. 
* If the true population average is very close to the null value, it will be very difficult to detect a difference and to reject the null hypothesis. 
* In other words, if the true population average is very different from the null value, it will be much easier to detect a difference.
* Clearly then, $\beta$, the probability of making a type two area depends on our effect size. An **effect size** is defined as the difference between the point estimate and the null value.

## Significance vs Confidence Level

* Broadly we can say that a significance level and a confidence level are complements of each other.
* A two sided hypothesis with threshold of $\alpha$ is equivalent to a confidence interval with $CL = 1 - \alpha$.
* A one sided hypothesis with threshold of $\alpha$ is equivalent to a confidence interval with $CL = 1 - (2 \times \alpha)$.
* If H0 is rejected, a confidence interval that agrees with the result of the hypothesis test should not include the null value.
* If H0 is failed to be rejected, a confidence interval that agrees with the result of the hypothesis test should include the null value.

## Statistical vs Practical Significance

* Real differences between the point estimate and the null value are easier to detect with large samples. 
* However very large samples will result in statistical significance even for tiny differences between the sample mean and the null value or our effect size, even when the difference is not practically significant.

## Exercises

OpenIntro Statistics, 3rd edition<br>
4.43, 4.45<br>
4.29, 4.31, 4.47

**4.43 Spam mail counts.** 
* The 2004 National Technology Readiness Survey sponsored by the
Smith School of Business at the University of Maryland surveyed 418 randomly sampled Americans,
asking them how many spam emails they receive per day. The survey was repeated on a new
random sample of 499 Americans in 2009.
* (a) What are the hypotheses for evaluating if the average spam emails per day has changed from
2004 to 2009.
* (b) In 2004 the mean was 18.5 spam emails per day, and in 2009 this value was 14.9 emails per
day. What is the point estimate for the difference between the two population means?
* (c) A report on the survey states that the observed difference between the sample means is not
statistically significant. Explain what this means in context of the hypothesis test and data.
* (d) Would you expect a confidence interval for the difference between the two population means
to contain 0? Explain your reasoning.

In [1]:
# (a)
# H0: mu_2009 - mu_2004 = 0
# HA: mu_2009 - mu_2004 != 0

In [3]:
# (b)
(p <- 18.5 - 14.9)

In [4]:
# (c)
# It means that assuming the null hypotheses is true, where the difference
# between the average spam emails per day in 2004 and 2009 is 0,
# the probability of observing a sample difference of 3.6
# is higher than the significance level alpha, in other words, 
# not rare. Hence, we cannot reject the null hypotheses and say
# that the data provides statistically strong evidence in favor of
# the alternative hypotheses.

In [5]:
# (d)
# Yes. Since the result is not statistically significant, we cannot
# reject the null hypotheses. Hence, we would expect 0 to be 
# include in our confidence interval, i.e. it's highly plausible
# to see the value 0.

**4.45 Spam mail percentages.** 
* The National Technology Readiness Survey sponsored by the
Smith School of Business at the University of Maryland surveyed 418 randomly sampled Americans,
asking them how often they delete spam emails. In 2004, 23% of the respondents said they delete
their spam mail once a month or less, and in 2009 this value was 16%.
* (a) What are the hypotheses for evaluating if the proportion of those who delete their email once
a month or less has changed from 2004 to 2009?
* (b) What is the point estimate for the difference between the two population proportions?
* (c) A report on the survey states that the observed decrease from 2004 to 2009 is statistically
significant. Explain what this means in context of the hypothesis test and the data.
* (d) Would you expect a confidence interval for the difference between the two population proportions to contain 0? Explain your reasoning.


In [6]:
# (a)
# H0: p_2004 = p_2009
# HA: p_2004 != p_2009

In [7]:
# (b)
(p <- 0.16 - 0.23)

In [8]:
# (c) 
# It means that assuming the null hypotheses is true, the probability
# of observing a difference of 0.07 is very small, and smaller than
# the significance level, hence we reject the null hypotheses 
# and say that the data provides evidence that there are difference 
# between the two population proportions, and the difference is not due
# to sampling variability.

In [9]:
# (d)
# No. As we rejected the null hypotheses.

**4.29 Testing for Fibromyalgia.** 
* A patient named Diana was diagnosed with Fibromyalgia, a
long-term syndrome of body pain, and was prescribed anti-depressants. Being the skeptic that she
is, Diana didn’t initially believe that anti-depressants would help her symptoms. However after
a couple months of being on the medication she decides that the anti-depressants are working,
because she feels like her symptoms are in fact getting better.
* (a) Write the hypotheses in words for Diana’s skeptical position when she started taking the
anti-depressants.
* (b) What is a Type 1 Error in this context?
* (c) What is a Type 2 Error in this context?

In [10]:
# (a)
# H0: The anti-depressants are not working.
# HA: The anti-depressants are working.

In [12]:
# (b)
# Type 1 error is to declare the anti-depressants to be working 
# when the truth is they are useless.

In [13]:
# (c)
# Type 2 error is to declare the anti-depressants to be useless
# when the truth is they are helping the symptoms.

**4.31 Which is higher?** 
* In each part below, there is a value of interest and two scenarios (I and
II). For each part, report if the value of interest is larger under scenario I, scenario II, or whether
the value is equal under the scenarios.
* (a) The standard error of $\bar{x}$ when s = 120 and (I) n = 25 or (II) n = 125.
* (b) The margin of error of a confidence interval when the confidence level is (I) 90% or (II) 80%.
* (c) The p-value for a Z-statistic of 2.5 when (I) n = 500 or (II) n = 1000.
* (d) The probability of making a Type 2 Error when the alternative hypothesis is true and the
significance level is (I) 0.05 or (II) 0.10.

In [14]:
# (a)
# SE of mean is inversely relating to sample size. SE is larger in (I).

In [16]:
# (b)
# The margin of error depends on the z-score calculated by the confidence 
# interval and the standard error.
qnorm((1-.9)/2)
qnorm((1-.8)/2)
# A higher confidence interval yield a higher z-score and hence a higher
# margin of error. (I) has a higher margin of error.

In [21]:
# (d)
# Type 2 error is failing to reject the null hypotheses when the 
# alternative hypotheses is true. In other words,
# declare a person innocent when he's actually guilty.
# When significance level is small, it's harder to declare a person guilty, 
# or to reject the null hypotheses. Hence, we're easier to 
# declare a person innocent when he's guilty. 
# (I) has a higher probability of making type 2 error.

**4.47 Practical vs. statistical.** 
* Determine whether the following statement is true or false, and
explain your reasoning: “With large sample sizes, even small differences between the null value
and the point estimate can be statistically significant.”

In [39]:
# True. 
n1 <- 1000
n2 <- 50
x <- 50
p <- 52
sd <- 10

se1 <- sd/sqrt(n1)
z1 <- (1-0)/se1
pnorm(z1, lower.tail = FALSE)

se2 <- sd/sqrt(n2)
z2 <- (1-0)/se2
pnorm(z2, lower.tail = FALSE)

# For large sample size, the standard error will be lower,
# and hence a larger test statistics and smaller p value.