# Lecture 6 - One-sample z- and t-tests
This notebook is the conceptual examples we will work through in Lecture.  This will not focus on the actual "doing" of the analysis in a practical application.  Lab 4 will focus on that and show you examples of how to use the appropriate code to conduct your analyses in HW 4.

In [None]:
# LIBRARIES
library(tidyverse)
library(magrittr) ## for pipe operators
library(pwr) ## for power function and ES.h (Cohen's h)
library(scales) ## for scaling functions for ggplot2
library(gridExtra) # two plots next to each other
library(lsr) ## for Cohen's D
options(repr.plot.width  = 8,
        repr.plot.height = 6)
bold.14.text <- element_text(face = "bold", size = 14)

In [None]:
## DATA
cah_oct <- read_csv("201710-CAH_PulseOfTheNation_Raw.csv")
## variable names currently full questions - need to rename
new_names <- c("income", "gender", "age", "age_cat", "polaffil", "trump", "educ", "race", "whtnat", "whtnat_rep",
              "love_us", "love_us_dem", "helppoor", "helppoor_rep", "racist", "racist_dem", "friendtrump", "civilwar",
              "hunting", "kale", "therock", "trumpvader")
colnames(cah_oct) <- new_names
cah_oct %<>% drop_na(income)
glimpse(cah_oct)

## One-Sample z-test - Conceptual Example
Overall, the mean household income in the United States, according to the US Census Bureau 2014 Annual Social and Economic Supplement, is $72,641. https://en.wikipedia.org/wiki/Household_income_in_the_United_States#Mean_household_income

We want to know if the average income in our october poll sample is significantly different from the current national average of $72,641.

How would we run a hypothesis test for this?

### Step 1 - Formulate Hypothesis

Population mean is $\mu$, which specified in our question - $72,641

Sample mean is $\bar{x}$ which we will calculate from our data.

$H_0 : \bar{x} = \$72,641$

$H_A : \bar{x} \neq \$72,641$

**Note:** Given our $H_A$ we're running a two-tailed test.



In [None]:
# set up some variables with important values
mu_inc <- 72641
popse <- 3500 ## the population se is 3.5k

# get lower and upper dollar values for null distribution graph.
lower <- mu_inc - 1.96*popse 
upper <- mu_inc + 1.96*popse

In [None]:
#graph the distribution of the population where null is true
# NONE OF THIS GRAPH CODE HERE WILL BE USEFUL FOR YOUR ASSIGNMENTS - THIS IS AN EDUCATIONAL EXAMPLE
z1 <- ggplot(data = data.frame(z = c(-3, 3)), aes(z)) +
      stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + 
      labs(y = "", title = "DISTRIBUTION IF NULL IS TRUE")+
      scale_y_continuous(breaks = NULL, expand = c(0,0)) + theme(text = bold.14.text)
z2 <- z1 + geom_vline(xintercept = 1.96, color = "#00bcd9", size = 2) +
           geom_vline(xintercept = -1.96, color = "#00bcd9", size = 2) 
z3 <- z2 + stat_function(fun = dnorm, xlim = c(-3,-1.96), geom = "area", alpha=0.2, fill = "#00bcd9")+
           stat_function(fun = dnorm, xlim = c(1.96,3), geom = "area", alpha=0.2, fill = "#00bcd9")
z4 <- z3 + annotate(geom="text", x=2, y=.05, label=paste0("if z = 1.96, \nincome = ", dollar(upper)),
                    color = "#4d004d", size = 5) +
           annotate(geom="text", x=-2, y=.05, label=paste0("if z = -1.96, \nincome = ", dollar(lower)),
                    color = "#4d004d", size = 5) +
           annotate(geom="text", x=0, y=.05, label=paste0("Population Mean = ", dollar(mu_inc)),
                    color = "#4d004d", size = 5)
z4

### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> Yes

Sample is <10% of the population? ->>> Yes

We can assume our population is normally distributed due to the Central Limit Theorem, but let's check on the normality of our sample anyway.

In [None]:
# create a QQ plot using ggplot, sample in aes is your numerical variable

cah_oct  %>%  ggplot(aes(sample = income)) +
  geom_qq() +
  geom_qq_line() +
  labs(title = "QQ Plot of Income in October Poll")+
    theme(axis.text.x = bold.14.text, 
                      text = bold.14.text)

In [None]:
s1 <- cah_oct %>% ggplot(aes(x = income/1000)) +
                geom_histogram(bins = 20, fill = "magenta") +
      labs(x = "Income in $1000", y = "Frequency",
           title = "Distribution of Income in October Poll") +
      theme(text = bold.14.text)
s2 <- s1 + geom_vline(xintercept = mean(cah_oct$income)/1000, color = "#00bcd9", size = 2) +
           annotate(geom="text", x=155, y=60, 
                    label=paste0("Sample Mean = ", dollar(mean(cah_oct$income))),
                    color = "#009bb3", size = 5) 
s3 <- s2 + geom_vline(xintercept = 72641/1000, color = "#990099", size = 2) +
           annotate(geom="text", x=165, y=50, 
                   label="Population Mean = $72.461",
                   color = "#4d004d", size = 5) 
s1


### Step 3: Calculate z-statistic and p-value

We know what the population mean is (`mu_inc` = $72,641) and what the population standard error is (`popse` = 3,500), so we can use a z-test to see if our sample mean is significantly different from the population mean.

We're doing a two-tailed test.

The formula is:
# $z = \frac{\bar{X} - \mu}{\sigma_{\bar{x}}}$ 

In [None]:
# calculate x_bar, the sample mean
x_bar <- mean(cah_oct$income)
x_bar

In [None]:
# calculate z score (observed)
z_obs <- (x_bar - mu_inc) / popse
z_obs

In [None]:
# add observed z value to null distribution graph
z5 <- z4 + geom_vline(xintercept = z_obs, color = "#66ff33", size = 2) 
z5

### Conceptual Z-test Example - Conclusion
Our sample mean was $74,710.  

It was not significantly different from the population mean of $72,641.  

The z-score associated with the sample mean is 0.59, which does not exceed -/+ 1.96 (critical z-score for a two-tailed test at alpha = 0.05), therefore we fail to reject null.  

What if we didn't know the population standard error?  We would need to use a t-test.

## One-sample t-test - Conceptual Example
We're going to use the same set up above, but this time since we don't know the population standard error, we need to estimate it with the sample standard error.  Therefore we cannot use the z-test, we have to instead conduct a t-test.  

In [None]:
# to set up a distribution if the null is true we need a few calculations

#sample standard error
sampse <- sd(cah_oct$income) / sqrt(nrow(cah_oct))

# degrees of freedom (n-1)
dof <- nrow(cah_oct) - 1

# critical value for a two-tailed t-test at alpha = 0.05
tcrit <- qt(0.975, df = dof)

lower_t <- mu_inc - tcrit*sampse 
upper_t <- mu_inc + tcrit*sampse

In [None]:
t1 <- ggplot(data.frame(t = c(-3, 3)), aes(x = t)) + 
      stat_function(fun = dt, args = list(0.05,df = dof)) +
      labs(y = "", title = "DISTRIBUTION IF NULL IS TRUE")+
      scale_y_continuous(breaks = NULL, expand = c(0,0)) + theme(text = bold.14.text)
t2 <- t1 + geom_vline(xintercept = tcrit, color = "#00bcd9", size = 2) +
           geom_vline(xintercept = -tcrit, color = "#00bcd9", size = 2) 
t3 <- t2 + stat_function(fun = dt, args = list(0.05, df = dof), 
                         xlim = c(-3,-tcrit), geom = "area", alpha=0.2, fill = "#00bcd9")+
           stat_function(fun = dt, args = list(0.05, df = dof), 
                         xlim = c(tcrit,3), geom = "area", alpha=0.2, fill = "#00bcd9")
t4 <- t3 + annotate(geom="text", x=2, y=.1, label=paste0("if t = 1.97, \nincome = ", dollar(upper_t)),
                    color = "#4d004d", size = 5) +
           annotate(geom="text", x=-2, y=.1, label=paste0("if t = -1.97, \nincome = ", dollar(lower_t)),
                    color = "#4d004d", size = 5) +
           annotate(geom="text", x=0, y=.05, label=paste0("Population Mean = \n", dollar(mu_inc)),
                    color = "#4d004d", size = 5)
t1

Now to calculate the t-value we use the same formula as for z, with a few adjustments.  Because we do not know the population standard error ($\sigma$) we need to instead use the sample standard error ($s_{\bar{x}}$).

# $t = \frac{\bar{X} - \mu}{s_{\bar{x}}}$ 

In [None]:
# calculate z score (observed)
t_obs <- (x_bar - mu_inc) / sampse
t_obs

In [None]:
# calculate p-value
# multiply by 2 because we're using a two-tailed test.

2*pt(t_obs, df = dof, lower.tail = FALSE)

In [None]:
# confirm results using the t.test function
t.test(cah_oct$income, mu = mu_inc)

In [None]:
t5 <- t4 + geom_vline(xintercept = t_obs, color = "#66ff33", size = 2)
t5

### Conceptual t-test Example - Conclusion
Our sample mean was $74,710.  

It was not significantly different from the population mean of $72,641.  

The t-score associated with the sample mean is 0.56, which does not exceed -/+ 1.97 (critical t-score for a two-tailed test at alpha = 0.05), therefore we fail to reject null.  

The p-value for the t-test is 0.57, which is higher than alpha = 0.05, so we fail to reject null, the mean of the sample is not significantly different from the population mean.

In [None]:
### THIS OPTION IS SIZING FOR JUPYTER NOTEBOOK ONLY
options(repr.plot.width  = 16, repr.plot.height = 6)
#####

grid.arrange(z5, t5, ncol=2)

## One-sample t-test of Proportions - Conceptual Example
We're going to also look at a one-sample t-test of proportions (a percentage).  In order to do this I'm going to take the trumpvader variable - "Who would you prefer as president of the United States, Darth Vader or Donald Trump?" - and create a 0/1 variable that indicates whether or not a person prefers Darth Vader.  Respondents who prefer Vader are given a value of 1 and those that prefer Trump are given a value of 0.  

From this we can use the same t-test setup where the mean is the proportion of individuals that support vader. p is the population proportion and $\bar{p}$ (p_bar) is the sample proportion.

# $t = \frac{\bar{p} - p}{s_p}$ 

The only difference is in how we calculate the sample standard error:

## $s_p = \sqrt{\frac{p(p-1)}{n}}$ 

What population proportion would you expect?

In [None]:
# data cleaning to create prefvader variable
cah2 <- cah_oct %>% filter(trumpvader != "DK/REF") %>% mutate(prefvader = ifelse(trumpvader == "Darth Vader", 1, 0))

In [None]:
#population proportion
p_pop <- .6 ### NEED TO SET THIS
p_pop
#sample proportion
p_samp <- mean(cah2$prefvader)
p_samp
#sample standard error
sampse_vader <- sqrt((p_samp*(1-p_samp))/nrow(cah2))
sampse_vader

In [None]:
# calculate t-value
t_obs_p <- (p_samp - p_pop) / sampse_vader
t_obs_p

In [None]:
# obtain t-crit
dof <- nrow(cah2) - 1
tcrit_p <- qt(0.025, df = dof)
tcrit_p

In [None]:
# obtain p-value, again we need to multiply by 2 because we are doing a two-tailed test.
2*pt(t_obs_p, df = dof, lower.tail = TRUE) ### CHECK TAIL

In [None]:
# confirm results with prop.test
# prop.test(x, n, p = NULL, alternative = "two.sided", correct = TRUE)
x = sum(cah2$prefvader) ## summing a 0/1 variable gives you the count of "yesses"
n = nrow(cah2) 
prop.test(x, n, p = p_pop)

In [None]:
# compare results with t.test()
t.test(cah2$prefvader, mu = p_pop)

In [None]:
### THIS OPTION IS SIZING FOR JUPYTER NOTEBOOK ONLY
options(repr.plot.width  = 8, repr.plot.height = 6)
##### 
ggplot(data.frame(t = c(-6, 6)), aes(x = t)) + 
      stat_function(fun = dt, args = list(0.05,df = dof)) +
      labs(y = "", title = "DISTRIBUTION IF NULL IS TRUE")+
      scale_y_continuous(breaks = NULL, expand = c(0,0)) + theme(text = bold.14.text) +
      stat_function(fun = dt, args = list(0.05, df = dof), 
                    xlim = c(-6,tcrit_p), geom = "area", alpha=0.2, fill = "#00bcd9")+
      stat_function(fun = dt, args = list(0.05, df = dof), 
                    xlim = c(-tcrit_p,6), geom = "area", alpha=0.2, fill = "#00bcd9") +
      annotate(geom="text", x=0, y=.05, label=paste0("Pop. Prop. = \n", p_pop),
               color = "#4d004d", size = 5) + 
      geom_vline(xintercept = t_obs_p, color = "#66ff33", size = 2) +
      annotate(geom="text", x= t_obs_p , y=.1, label=paste0("Observed t = ", round(t_obs_p, digits = 2)),
               color = "#4d004d", size = 5) 

### Conceptual t-test of Proportions Example - Conclusion

About 41% of sampled individuals supported Vader over Trump.

How did our hypothesized population proportion compare?


## Effect Size - Unstandardized

The unstandardized effect size is the difference in means, in the units of our observations.
	
The analyst needs to know something about what's a substantively significant difference would be, in their opinion.

Let's review the difference in means and difference in proportions for our previous t-tests and decide if we think they are substantive

In [None]:
## difference between sample mean income and population mean income
diff1 <- mean(cah_oct$income) - mu_inc
dollar(diff1)

The difference between the average income in the sample and the average income in the US population is about $2,000.  Personally I would not consider this to be a sizeable difference.

In [None]:
## difference between sample proportion support for Vader and expected/null hypothesis proportion we decided on.

diff2 <- p_samp - p_pop
paste0(round(diff2*100, digits = 2), "%")

What do we conclude about the unstandardized difference in percent supporting Vader?

## Effect Size - t-test of means - Cohen's d
The measure of effect size for a one-sample t-test is Cohen's D.  It is:
- A standardized statistic that measures how far the mean of your observation is from the mean of your null hypothesis (population).
- A unitless measure of effect size
- Not affected by number of observations

Let's calculate Cohen's d for the difference in mean income. We will use the `cohensD` function from the `lsr` package.

In [None]:
# cohensD(vector of observations, mu = nullhypothesismean)
cohensD(cah_oct$income, mu = mu_inc)

Our "rule of thumb" for interpreting Cohen's d is:

| Cohen's d | Effect Size |
|:---------:|:-----------:|
| 0.20 | Small |
| 0.50 | Medium|
| 0.80 | Large |

Therefore, this value of Cohen's d of 0.032 reflects and extremely small effect.  Combined with our interpretation of the unstandardized effect size, we would conclude that the difference is not substantively significant.


## Effect Size - t-test of proportions - Cohen's h
Cohen's h is the version of Cohen's d that we use when we conduct a t-test of proportions. For this we'll use the `ES.h` function from the pwr package.  We will also compare that value to a Cohen's d for this data as well.

In [None]:
### The function is ES.h(ps,pu) - ps is sample prop. and pu is population (universe) prop.
ES.h(p_samp, p_pop)

In [None]:
## comparing to using Cohen's d function with our proportion data.
cohensD(cah2$prefvader, mu = p_pop)

We interpret Cohen's h using the same "rule of thumb" values as Cohen's d.  What do you conclude regarding the substantive significance of the difference in proportions?

## Power Analysis
Similar to the pwr test we used in Chi-square analysis, we need measures of the effect size, sample size, alpha level, power.  The fifth piece is the degrees of freedom, which is calculated from the n value.

Other t-test specific options are the alternative hypothesis you are using, and the type of test (one sample or two sample).
- n: sample size
- d: effect size (in this case cohensD)
- sig.level: alpha
- power: power (1- β )
- alternative: type of alternative hypothesis (two.sided, less, or greater)
- type: right now we're only looking at one.sample tests

Our sample mean of income was not significantly different that the population mean.  Did we have enough power to detect an effect?

In [None]:
# setup some of the variables we need
samp_size = nrow(cah_oct)
eff_size = cohensD(cah_oct$income, mu = mu_inc)
alpha = 0.05

# calculate the power
pwr.t.test(n = samp_size, d = eff_size, sig.level = alpha, power = NULL, alternative = "two.sided", type = "one.sample")

The power for our t-test was only 0.08.  The goal is to have 0.8 power.  So we have 92% chance of Type II error.

How big of a sample would we need to have 80% power?

In [None]:
# calculate the sample size for 80% power
pwr.t.test(n = NULL, d = eff_size, sig.level = alpha, power = 0.8, alternative = "two.sided", type = "one.sample")

To detect a significant difference with this small of an effect size we would need 7411 observations.  What if we wanted to only detect an effect size of 0.2, if it exists?

In [None]:
# calculate the sample size for 80% power
pwr.t.test(n = NULL, d = 0.2, sig.level = alpha, power = 0.8, alternative = "two.sided", type = "one.sample")

If we wanted to detect a significant difference that has a Cohen's d of 0.2 we would need 199 observations.

### Power Analysis - Proportions
There is a variant of the power function for proportions - `pwr.p.test`.  It accepts a Cohen's h value instead of a Cohen's d value.

Given the result of our prop.test, what type of analysis should we do?

In [None]:
samp_size_p <- nrow(cah2)
eff_size_p <- ES.h(p_samp, p_pop)


pwr.p.test(n = NULL, h = eff_size_p, sig.level = alpha, power = 0.8, alternative = "two.sided")