In [1]:
library(tidyverse)
library(DescTools)
library(plotrix)
library(lsr)
library(pwr)
library(readxl)

-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.2.1     v purrr   0.3.2
v tibble  2.1.3     v dplyr   0.8.3
v tidyr   0.8.3     v stringr 1.4.0
v readr   1.3.1     v forcats 0.4.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()


For this notebook I'm going to use a few variables I pulled out of the American National Election Studies and cleaned and recoded.  The variables are:

Feeling thermometers (0-100 ratings on topics)
- "ft_sci" = Feelings about Scientists
- "ft_bigbusn" = Feelings about Big Business
- "ft_rich" = Feelings about Rich People
- "ft_congress" = Feelings about Congress

And two demographic variables (I've already collapsed into two groups)
- "race" = White vs. Not White
- "educ" = College Grad or higher vs. lower than college grad

In [2]:
#load the .rds file
anes <- readRDS("anes.rds")
head(anes)


ft_sci,ft_bigbusn,ft_rich,ft_congress,race,educ
70,70,50,70,white,less than BA
70,40,50,50,white,coll grad or higher
70,70,50,50,white,less than BA
50,40,50,50,white,less than BA
85,50,70,50,white,less than BA
50,70,50,0,white,coll grad or higher


# Two Sample t-tests

## Independent Samples

We'll start by looking at two-sample t-tests for independent samples.  This is the most common type of two-sample t-test, used when there is no relationship between the two samples (they aren't paired in any way).

The easiest way to conduct a two-sample t-test in R when you have a dataframe of observations is to specify your test in "model format."  

This format is "dv ~ iv" where dv is the name of your dependent (numerical) variable and iv is the name of your independent variable (groups).


In [3]:
## Conduct a t-test on the mean rating of scientists by education level
## general form - t.test(dv ~ iv, data)


t.test(ft_sci ~ educ, data = anes)


	Welch Two Sample t-test

data:  ft_sci by educ
t = 9.1345, df = 3316, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.627323 7.156726
sample estimates:
mean in group coll grad or higher        mean in group less than BA 
                         80.18772                          74.29569 


The results show us our typical output - tvalue, df, p-value.  This p-value indicates that there is a significant difference in the mean feeling about scientists between those who are lower educated and those with college degrees or higher.  The reported means for the two groups in the output show us that the higher rating is among college graduates or higher.

Remember that the alternative hypothesis here is just reporting the hypothesis tested (diff not equal to 0) and has no bearing on the result of the t-test.

The 95% confidence interval is the 95% confidence interval in the <b> difference </b> between the two means.  We know that the difference is signficant because the 95% confidence interval doesn't contain 0.

## Unequal Variances

You may have noticed above that R ran something called the "Welch Two Sample t-test," which is a form of the two-sample t-test that does not assume that the variances of the means of the two groups are equal.  

If we want to use the form of the calculation for equal variances we need to first test to see if our variances are equal.

This test looks at the ratio of our two variances: 

## $$\frac{\sigma_1^2}{\sigma_2^2}$$

The closer that this ratio is to 1, the more equal the variances are.

Because the test is looking at a ratio, we will use an F-value to test our hypothesis of equal variance - the F-value has two parameters, numerator degrees of freedom and denominator degrees of freedom.

The hypotheses for our test of equal variances are:

$H_0: var1 = var2 $ (equal variances)

$H_A: var1 \neq var2 $ (unequal variances)

In this case, failing to reject null is often seen as "good" because it means our variances are equal enough to use that version of the t-test.

If we reject null, we must use the version of the t-test that takes unequal variances into account.


In [4]:
## test to see if our groups (educ) have equal variances for the ft rating about scientists
## general form - var.test(dv ~ iv)

var.test(anes$ft_sci ~ anes$educ)


	F test to compare two variances

data:  anes$ft_sci by anes$educ
F = 0.76954, num df = 1432, denom df = 2089, p-value = 9.067e-08
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.7000132 0.8467061
sample estimates:
ratio of variances 
         0.7695443 


The output shows us that the ratio of variances is 0.769 - meaning that the denominator variance was greater than the numerator variance.  

We get our F-value and our two degrees of freedom - one for the numerator and one for the denominator - these are n-1 for each of our groups we're comparing (college grad and greater vs. less than college grad)

Our p-value is very low, which means we reject the null hypothesis.  This provides support for the alternative hypothesis, which says that the true ratio of variances is not equal to 1 (unequal variances).  We can also see that our 95% confidence interval for the ratio of variances does not contain one, so they are significantly unequal.

This means that we do need to use the Welch Two-Sample t-test" which again, is the default in R `t.test()`.

If we wanted to run the t-test assuming equal variances, the code would be: 

In [5]:
## Conduct a t-test on the mean rating of scientists by education level - assume equal variances
## general form - t.test(dv ~ iv, data, var.equal = TRUE)
t.test(ft_sci ~ educ, data = anes, var.equal = TRUE)


	Two Sample t-test

data:  ft_sci by educ
t = 8.9152, df = 3521, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.596243 7.187805
sample estimates:
mean in group coll grad or higher        mean in group less than BA 
                         80.18772                          74.29569 


In this case our results are essentially similar, but there are slight differences in the values.

In [6]:
# run both, save results
eqv <- t.test(ft_sci ~ educ, data = anes, var.equal = TRUE)
uneqv <- t.test(ft_sci ~ educ, data = anes)

In [7]:
# Like before, our saved t-test has many saved pieces we can extract.  Let's look at the structure
str(eqv)

List of 10
 $ statistic  : Named num 8.92
  ..- attr(*, "names")= chr "t"
 $ parameter  : Named num 3521
  ..- attr(*, "names")= chr "df"
 $ p.value    : num 7.66e-19
 $ conf.int   : num [1:2] 4.6 7.19
  ..- attr(*, "conf.level")= num 0.95
 $ estimate   : Named num [1:2] 80.2 74.3
  ..- attr(*, "names")= chr [1:2] "mean in group coll grad or higher" "mean in group less than BA"
 $ null.value : Named num 0
  ..- attr(*, "names")= chr "difference in means"
 $ stderr     : num 0.661
 $ alternative: chr "two.sided"
 $ method     : chr " Two Sample t-test"
 $ data.name  : chr "ft_sci by educ"
 - attr(*, "class")= chr "htest"


In [8]:
# compare t-values
eqv$statistic
uneqv$statistic

In [9]:
# compare degrees of freedom
eqv$parameter
uneqv$parameter
length(anes$ft_sci) ## n in dataframe

Why are our degrees of freedom different? The t-test that assumes equal variances in the two groups uses the expected n-2.  The Welch t-test has something lower, and it has decimal places?  This is because the degrees of freedom are approximated using a formula called the Welch–Satterthwaite equation.  All of the adjustments are made to account for the unequal variances.  

There is no benefit, however, to running the t-test with equal variances.  The Welch Two-Sample t-test works for both equal and unequal variances, which is why it is the default in R.

## Confidence Intervals
Like I said before the confidence intervals provided in the t-test output are the 95% CI for the <b><i> difference </b></i> in the means.  If this CI doesn't contain 0, the difference in the means is significantly different from zero.

Let's look at another variable to test, feeling rating about congress by race (white vs. non-white)

In [10]:
## Conduct a t-test on the mean rating of congress by race
## general form - t.test(dv ~ iv, data)

t.test(ft_congress ~ race, data = anes) 
## don't need any options about equality of variances - we'll use the default that assumes they are not equal


	Welch Two Sample t-test

data:  ft_congress by race
t = 7.3471, df = 1570.9, p-value = 3.241e-13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.731749 8.178400
sample estimates:
mean in group not_white     mean in group white 
               47.35863                40.90355 


The 95% CI for the difference in mean feeling about congress between whites and non-whites is from 4.73 to 8.18 points (on a scale from 0 to 100).

## Paired Samples t-tests
When our samples are not independent, they arepaired in some way, we need to use a special form of the t-test to account for the paired scores.  In most cases they are two measurements of the same variable taken from the same unit (same person) at two different times.  In other cases, it may be appropriate to use paired samples tests even if the measurements are not from the exact same person - such as if they are from spouses.  Or, in some cases medical trials pair together two individuals who match on many demographic characteristics and give one the treatement and one the control - since they were paired in that way and it determined their treatment they can't be considered independent.

For a paired t-test the $SE_{diff}$ is calculated with the difference between the pre- and post- scores, where $d$ is the difference, $n$ is the number of <b> pairs </b> and $df = n-1$ (number of pairs minus 1).  

# $$  t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{n(\sum d^2) - (\sum d)^2}{n-1}}}  $$

This calculation should look familiar to the "computational" formula for variance we saw in the descriptive statistics lecture.  If we have pre- and post- scores from every person in the dataframe, then n-1 is number of people minus 1.

For this example I'm going to load in another cleaned set of variables from ANES, this time pre- and post-election feeling thermometer ratings of the democratic and republican presidential candidates from 2016 (Clinton and Trump).



In [11]:
#load the .rds file
anes2 <- readRDS("anes2.rds")
head(anes2)

ft_pre_dem,ft_pre_rep,ft_post_dem,ft_post_rep,partyid
0,85,15,85,rep
0,85,50,60,rep
85,0,85,50,dem
0,85,0,100,dem
85,0,70,15,dem
50,60,0,70,dem


I'm going to first compare the mean rating of the democratic candidate, Clinton, pre- and post-election among all respondents.

In [12]:
## paired t-test code -  t.test(pre, post, pooled=F, paired=T)
## pooled is F (false) because it's a single sample, repeated
## paired is T (true) because we're conducting a paired t-test

t.test(anes2$ft_pre_dem, anes2$ft_post_dem, pooled=F, paired=T)



	Paired t-test

data:  anes2$ft_pre_dem and anes2$ft_post_dem
t = -3.4291, df = 2237, p-value = 0.0006167
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.8710909 -0.5096062
sample estimates:
mean of the differences 
              -1.190349 


This result again looks very similar to what we've seen previously, and our p-value is less than alpha = 0.05 so we can reject the hypothesis that the difference in means is 0, therefore we conclude that there is a statistically significant difference in the pre- and post-election ratings of Hilary Clinton.

Again, the confidence interval is for the difference in means - the negative number is a bit confusing - it comes from pre minus post, so the mean was higher in the post-election interview.  The mean displayed in the output is also just the difference, not the means of the two ratings.  If we're interested in those we can compare:

In [13]:
# get means for pre- and post- Clinton ratings
mean(anes2$ft_pre_dem)
mean(anes2$ft_post_dem)
mean(anes2$ft_pre_dem) - mean(anes2$ft_post_dem) ## difference in means

We could also look at the difference within the party ids, but by running t-tests separately for each subgroup.

In [14]:
## paired t-test filtered group of only democrats
dems <- anes2  %>% filter(partyid == "dem")
t.test(dems$ft_pre_dem, dems$ft_post_dem, pooled=F, paired=T)


	Paired t-test

data:  dems$ft_pre_dem and dems$ft_post_dem
t = -1.9124, df = 1217, p-value = 0.05605
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.98096306  0.02529804
sample estimates:
mean of the differences 
             -0.9778325 


For the democrats the difference in pre- and post-election ratings was much smaller, and is not significant at alpha = 0.05.  The p-value is greater than 0.05 and the confidence interval crosses over 0.  This means that there is NOT a statistically signficant difference between the pre- and post-election ratings of Clinton among democrats.

In [15]:
## paired t-test filtered group of only republicans - ratings of Trump
reps <- anes2  %>% filter(partyid == "rep")
t.test(reps$ft_pre_rep, reps$ft_post_rep, pooled=F, paired=T)


	Paired t-test

data:  reps$ft_pre_rep and reps$ft_post_rep
t = -9.7663, df = 1019, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.570947 -4.372190
sample estimates:
mean of the differences 
              -5.471569 


In [16]:
mean(reps$ft_pre_rep)
mean(reps$ft_post_rep)