# Two-Sample t-tests
<a id = "top"></a>
This lab will focus on how to conduct two-sample t-tests in practical application.  It will provide multiple examples of two-sample t-tests of both numerical variables and 0/1 variables (test of proportions).  Each example will take you through the entire process from examining the data, conducting the t-test, and looking at effect size.  

The stand-alone Lecture 7 notebook we covered in class has the conceptual examples to understand what is going on "under the hood."

### Table of Contents

- [Test of Means - Two-sample](#mean1)
- [Test of Means - Paired Test](#mean2)
- [Test of Proportions Example](#prop)
- [Test of Median - Non-parametric](#median)
- [Power Analysis](#power)
- [Visualizing Differences in Means](#viz)
- [Practice Problem](#prac)

In [None]:
# LIBRARIES
library(tidyverse)
library(magrittr) ## for pipe operators
library(pwr) ## for power function and ES.h (Cohen's h)
library(scales) ## for scaling functions for ggplot2
library(effsize) ## for Cohen's D
library(DescTools)

## bold text specification for ggplot
bold.14.text <- element_text(face = "bold", size = 14)

### these plot size options are for jupyter notebooks ONLY
options(repr.plot.width  = 8,
        repr.plot.height = 6)

In [None]:
## LOAD the DATA
cah <- read_csv("201806-CAH_PulseOfTheNation_Raw.csv")
## variable names currently full questions - need to rename
new_names <- c("gender", "age", "agerange", "race", "income", "educ", "partyid", "polaffil", 
               "trump", "hollymoney", "fed_min_is", "fed_min_should", "fed_tax_is", "fed_tax_should", 
               "redist", "redist_you", "redist_people", "baseincome", "faircomp", "ceofair", "attractive")
colnames(cah) <- new_names
glimpse(cah)

In [None]:
#question text
spec(cah)

<a id="mean1"></a>
## Test of Means - Example 1
Again we're going to use data from a Cards Against Humanity poll, this time from June 2018.  Some of the questions in this month's poll focused on the federal minimum wage.  For this first example I'm going to look at the variable `fed_min_is` that reflects the answers to the question - "If you had to guess, in dollars per hour, what do you think the federal minimum wage is?"  In the last lab we compared the overall sample mean to a null hypothesis mean of $7.25.  This time we're going to compare the mean guessed minimum wage by race to see if there is a difference between White respondents and non-White respondents.

I pre-inspected the data and noticed both NA values and some extreme outliers.  I'm going to quickly handle that data cleaning step.  Note that I'm only removing NA and DK/REF on these _**TWO**_ variables and not all of the variables in the dataset.  No reason to limit observations based on NA on variables we're not using.

In [None]:
cah1 <- cah %>% drop_na(fed_min_is) %>% filter(fed_min_is < 40 & race != "DK/REF")  %>% mutate(race = fct_lump(race))
summary(cah1$fed_min_is)

<a id="density"></a>
We'll look at a quick visualization of the distribution, then we'll proceed to our hypothesis test.

In [None]:
cah1 %>%
  ggplot( aes(x=fed_min_is, fill=race)) +
    geom_density(alpha=0.6) +
    scale_fill_manual(values=c("#26d5b8", "#ff5733")) +
    labs(fill="Race",
         y = "Density",
         x = "Guessed Minimum Wage",
         title = "Distribution of Guesses of Federal Minimum Wage by Race") +
    theme(text = bold.14.text)

The density graph above shows us the distribution of `fed_min_is` by `race`.  Both distributions appear to deviate a bit from normality and have right skew.  Both are bimodal, but the second mode in the "Other" race category appears to be larger - indicating perhaps that there is more spread (variance) in that group.

### Step 1 - Formulate Hypothesis

$H_0: \mu_{white} = \mu_{other}$

$H_A : \mu_{white} \neq \mu_{other}$

### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> Yes

Sample is <10% of the population? ->>> Yes

Sampling distribution is normally distributed? ->>> Yes, given Central Limit Theorem

**Are the variances of each sample equal? ->>>**

The variance of each group/sample ($s_x^2$) need to be relatively equal to each other.  If the variance is equal, they have a ratio of one.

We can check this assumption via hypothesis test:

$H_0: var1 = var2$

$H_A: var1 \neq var2$


In [None]:
## use var.test() to test the homogeneity of variances before running t-test
## var.test(outcomevariable_numeric ~ predictorvariable_categorical)
var.test(cah1$fed_min_is ~ cah1$race)

#### Are the variances significantly different?
No.  Because the p-value is greater than alpha we fail to reject the null hypothesis.  This means that we can conclude that the variances are equal, which means we're not violating the assumption.  So in this case it's a "good" thing to fail to reject null.


### Step 3: Run the t-test
We can call the `t.test()` function using a "formula" specification similar to that we provided to the var.test() function above.  We do not need to specify a mu because we're not doing a one-sample t-test.

In [None]:
# t.test(outcomevariable_numeric ~ predictorvariable_categorical) 
# options to change defaults include alternative =, pooled =, paired =, and var.equal = TRUE

t.test(cah1$fed_min_is ~ cah1$race, var.equal = TRUE)
### IMPORTANT - I can use var.equal because of the result of var.test() do not use var.equal if your var.test is significant

#### Conclusions:
I'm going to review the t-test output from bottom to top.
1. The last line at the bottom of the output tells us the two group means.  The mean of min wage guess in White respondent group is 10.15 dollars.  The mean of min wage guess in Other respondent group is 10.59 dollars.  The difference in these values are not substantively significant.
2. We get a 95% CI.  This is the 95% CI of the _**difference**_ in means between the two groups.  Because this confidence interval crosses 0 we can conclude that the difference in means is not statistically significant.  This is because the null hypothesis says that the difference in means is 0; when our null hypothesis value is in the CI our sample estimate is not significant.
3. At the top we get the t-value, the degrees of freedom, and the p-value.
    - The t-value is negative.  This is because the mean of the first group (White) is lower than the mean of the second group (Other). If the factor levels were reversed the sign would be different but it wouldn't affect our result.
    - The degrees of freedom is 685, which is n_1 + n_2 - 2.
    - The p-value is 0.1867, which is higher than an alpha of 0.05, therefore we fail to reject null.
4. There is no significant difference in the guesses of what federal minimum wage is by race.  A person's race is not a factor in knowing or correctly guessing federal minimum wage.


### Substantive Significance - Effect Size

Our result was not statistically significant, however we can still review the substantive significance.  Occassionally we may have a substantive difference that does not reach the threshold for statistical significance, typically due to inadequate power.

#### Unstandardized Effect Size
This is the "raw" difference in means in the units of the observations.  In this case the difference is about 45 cents, which is not at all substantive.

#### Standardized Effect Size
For this test of means we'll use Cohen's d to determine the standardized effect size. Cohen's d for a two-sample t-test has the same interpretation as with one-sample t-tests.  We're going to use a different package for effect sizes moving forward, which has more options for the various versions of Cohen's d - including Hedge's g and Cohen's d for paired t-tests.  We'll also look at r-squared.  We can use both to make a conclusion about substantive significance.

In [None]:
# cohen.d(outcomevariable_numeric ~ predictorvariable_categorical)
cohen.d(cah1$fed_min_is ~ cah1$race)

In [None]:
# effect size - rsquared
fed <- t.test(cah1$fed_min_is ~ cah1$race, var.equal = TRUE)
# the $statistic of a saved t-test object is the t-value.  The $parameter is the degrees of freedom
# rsquared of t-test is t-squared over t-squared plus df
rsq <- fed$statistic^2 / (fed$statistic^2 + fed$parameter)
names(rsq) <- "r-squared" # re-label the value from t to rsq
rsq # proportion
percent(rsq, accuracy = .01) # percentage

The Cohen's d value (0.8) corresponds with below a small size based on our "rule of thumb" values.  R has also labeled this difference as negligible. The r-squared value of 0.25% tells us the percentage of the variance in our outcome that is explained by the predictor.  Our value of 0.25% shows us that race explains practically zero of the variance in the responses of what federal minimum wage is. This corresponds with our unstandardized conclusion - there is no substantive significance.

[Return to Top](#top)
<a id = "mean2"></a>

## Test of Means - Paired sample

For this example we're going to conduct a paired t-test.  We're going to compare the means of `fed_min_is` to the mean of `fed_min_should` to see if they are significantly different.  Because these are two measures taken from the same respondent we must use a paired t-test.

In this case we're comparing the distribution of two numerical variables instead of defining two groups using a categorical variable.  The numerical variables we compare via a paired t-test need to be on the same scale and have the same units.

In [None]:
# data cleaning - go back to original df, create a new df for this analysis that removes NAs on fed_min variables 
# as well as outliers.
cah2 <- cah %>% drop_na(fed_min_is, fed_min_should) %>% filter(fed_min_is < 40 & fed_min_should < 40)

To visualize paired differences we need to combine the observations into one column with another column to act as the indicator of the two groups (in this example the "is" group vs. the "should" group). So I need to convert the data from "wide" to "long" for this purpose.

In [None]:
cah_long <- cah2 %>% 
                # select two variables
                select(fed_min_is, fed_min_should) %>% 
                # pivot longer, specify columns, what to name the level variable, what to name the values column
                gather(key = "question", value = "wage")  %>% 
                mutate(question = fct_recode(question, "Fed. Min. Wage is" = "fed_min_is", 
                                                       "Fed. Min. Wage should be" = "fed_min_should"))
head(cah_long)

In [None]:
p1 <- cah_long %>%
  ggplot(aes(x=wage, fill=question)) +
    geom_density(alpha=0.6) +
    scale_fill_manual(values=c("#26d5b8", "#ff5733")) +
    labs(fill="",
         y = "Density",
         x = "Hourly Minimum Wage",
         title = "Distribution of Federal Minimum Wage",
         subtitle = "What people think it is vs. what they think it should be") +
    theme(text = bold.14.text, legend.position = "top") 

## adding optional stuff - lines and annotations to place and label means.
is_mean <- mean(cah2$fed_min_is)
should_mean <- mean(cah2$fed_min_should)

p2 <- p1 + 
    scale_x_continuous(labels = dollar) +
    geom_vline(xintercept = is_mean, color = "#26d5b8", size = 2) +
    geom_vline(xintercept = should_mean, color = "#ff5733", size = 2) +
    annotate(geom="text", x=27, y=.15, 
             label=paste0("Min Wage Is Mean = ", dollar(round(is_mean, digits = 2))),
             color = "#26d5b8", size = 6, fontface = 2)+
    annotate(geom="text", x=27, y=.125, 
             label=paste0("Min Wage Should Mean = ", dollar(round(should_mean, digits = 2))),
             color = "#ff5733", size = 6, fontface = 2)

p2

### Step 1 - Formulate Hypothesis

$H_0: \mu_{is} = \mu_{should}$

$H_A : \mu_{is} \neq \mu_{should}$


### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> No, hence the paired t-test

Sample is <10% of the population? ->>> Yes

Sampling distribution is normally distributed? ->>> Yes, given Central Limit Theorem

**Are the variances of each sample equal? ->>>**

In [None]:
#var.test(variable1, variable2)
var.test(cah2$fed_min_is, cah2$fed_min_should)

#### Are the variances significantly different?
**YES.** Because the p-value is less than an alpha of 0.05 we reject null.  This means we have to conclude that the variances are significantly different, therefore we are violating that assumption.  So we need to use the version of t.test() that accounts for that violation of the assumption of homogeneity of variance.  

### Step 3: Run the t-test
We can call the `t.test()` function using our two variables. Luckily, the default of t.test() assumes unequal variance, so we just leave out var.equal = TRUE.

In [None]:
# t.test(variable1, variable2, paired = TRUE, pooled = TRUE) 
# need paired and pooled options for a paired t-test
t.test(cah2$fed_min_is, cah2$fed_min_should, paired = TRUE, pooled = TRUE)

#### Conclusions:
1. At the top we get the t-value, the degrees of freedom, and the p-value.
    - The t-value is negative.  This is because the mean of the "is" values is lower than the mean of the "should be" values. If the variables were reversed in your call to t.test() the sign would be different but it wouldn't affect our result.
    - The degrees of freedom is the number of pairs.
    - The p-value is very small, which is much lower than an alpha of 0.05, therefore we reject null.
2. We get a 95% CI.  This is the 95% CI of the _**difference**_ in means between the two wage variables.  Because this confidence interval does not cross 0 we can conclude that the difference in means is statistically significant.  This is because the null hypothesis says that the difference in means is 0; when our null hypothesis value is in the CI our sample estimate is not significant.
3. There is a significant difference in what our respondents report the minimum wage is and what they think it should be.

#### Unstandardized Effect Size
This is the "raw" difference in means in the units of the observations.  This difference is about $2.  This seems like a moderate, not substantively large difference.

#### Standardized Effect Size
We'll use both Cohen's d and r-squared.

In [None]:
# same function, but we need to add the same pooled and paired arguments to match the t-test
cohen.d(cah2$fed_min_is, cah2$fed_min_should, pooled = TRUE, paired = TRUE)

In [None]:
# effect size - rsquared
pair <- t.test(cah2$fed_min_is, cah2$fed_min_should, paired = TRUE, pooled = TRUE)
# the $statistic of a saved t-test object is the t-value.  The $parameter is the degrees of freedom
# rsquared of t-test is t-squared over t-squared plus df
rsq2 <- pair$statistic^2 / (pair$statistic^2 + pair$parameter)
names(rsq2) <- "r-squared" # re-label the value from t to rsq
rsq2 # proportion
percent(rsq2, accuracy = .01) # percentage

The Cohen's d value (0.68) corresponds with a medium/large effect size based on our "rule of thumb" values.  It shows that the difference of two dollars in this test is fairly substantive.  The r-squared of 0.337 (or 33.7%) shows us that 34% of the variance in the sample is explained by the difference between the questions (what the federal minimum wage is vs. what it should be), which is a small but substantial amount of variance explained.


[Return to Top](#top)
<a id = "prop"></a>

## Example 3 - Test of Proportions

For our test of proportions example we'll look at support for Universal Basic Income.  Universal Basic Income is a monthly income provided to all citizens by the government, regardless of need.  We'll see if support for UBI varies by gender.

### Step 1 - Formulate Hypothesis

$H_0 : p_{female} = p_{male}$

$H_A : p_{female} \neq p_{male}$

First we'll do some quick data cleaning to remove observations with the value of "DK/REF" and convert the "Yes" and "No" values on baseincome to 1 and 0. Remember we're cleaning the df fresh for each analysis, only limiting the observations to unusable observations for only the variables we're currently using.

Then we'll look at a visualization of the difference.  Because these are proportions we use a bar chart with 95% CI error bars, not a histogram or density plot.
<a id = "bar"></a>

In [None]:
cah3 <- cah %>% filter(baseincome != "DK/REF" & !(gender %in% c("DK/REF", "Other"))) %>% 
                mutate(ubi_support = ifelse(baseincome == "Yes", 1, 0))
table(cah3$gender)
table(cah3$ubi_support)
summary(cah3$ubi_support)

In [None]:
#create table with data needed for bar chart and 95% CIs
proptab <- cah3 %>% 
            group_by(gender)  %>% 
            summarize(prop = mean(ubi_support),
                      se = sqrt(prop*(1-prop)) / sqrt(n()))

#create bar chart of proportions with 95% CIs
proptab %>% ggplot(aes(x = gender, y = prop, fill = gender)) +
                geom_bar(stat = "identity", position = position_dodge()) +
                geom_errorbar(aes(ymin = prop - 1.96*se, ymax = prop + 1.96*se), 
                                  width = 0.3, position = position_dodge(0.9), size = 1) +
                labs(title = "Proportion Who Support Universal Basic Income by Gender",
                     subtitle = "With 95% Confidence Interval",
                     x = "Gender",
                     y = "Proportion") +
                theme(legend.position = "none", text = bold.14.text) +
                scale_fill_manual(values=c("#ff5733", "#26d5b8")) 

Because the confidence intervals (error bars) do not appear to overlap, we can conclude that the difference is likely statistically significant.

### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> Yes

Sample is <10% of the population? ->>> Yes

Sampling distribution is normally distributed? ->>> Yes, given Central Limit Theorem

**Are the variances of each sample equal? ->>>**

In [None]:
## use var.test() to test the homogeneity of variances before running t-test
## var.test(outcomevariable_numeric ~ predictorvariable_categorical)
var.test(cah3$ubi_support ~ cah3$gender)

#### Are the variances different?
No. We fail to reject null and therefore can conclude that the variances are not statistically significant; so we do not violate the assumption of homogeneity of variance.  We can use the unadjusted version of the t-test, but we're going to take a look at the Welch's t-test with adjustment in this example.  By default, unless you specify otherwise, R will calculate the t-test assuming that variances are unequal

### Step 3: Run the test of proportions
We'll use t.test() to run the test of proportions; treating proportions as means.  We will also see that chi-square test is an equally valid test to use for this purpose.

In [None]:
# t.test(outcomevariable_numeric ~ predictorvariable_categorical) 
# default alternative hypothesis is two.sided

t.test(cah3$ubi_support ~ cah3$gender)

#### Conclusions (t-test version):
1. At the top we get the t-value, the degrees of freedom, and the p-value.
    - The degrees of freedom is not an integer.  This is due to the adjustment for unequal variances - the adjustment reduces our degrees of freedom
    - The p-value is very small, which is much lower than an alpha of 0.05, therefore we reject null.
2. We get a 95% CI.  This is the 95% CI of the _**difference**_ in means between the proportion supporting UBI.  Because this confidence interval does not cross 0 we can conclude that the difference in means is statistically significant.  This is because the null hypothesis says that the difference in means is 0; when our null hypothesis value is in the CI our sample estimate is not significant.
3. There is a significant difference in support for UBI by gender. Women are significantly more likely to support UBI.

In [None]:
# chisq.test instead, use the baseincome variable with the two-levels, yes and no
ubitab <- table(cah3$baseincome, cah3$gender)
chisq.test(ubitab)

#### Conclusions (chi-square version):
1. We essentially get an identical result. The p-value is very small, which is much lower than an alpha of 0.05, therefore we reject null.
3. There is a significant difference in support for UBI by gender. Support for UBI is dependent on the gender of the individual.

When we have a two-sample test of proportions it can be run as either a t-test OR a chi-square test.  They are equally valid.

#### Unstandardized Effect Size
This is the "raw" difference in proportions by gender - about 13%.  I would consider a difference of over 10% points a large substantive difference.

#### Standardized Effect Size
For this test of proportions we'll look at Cohen's d to determine the standardized effect size, but will also look at r-squared.

In [None]:
# use the cohen.d() function as it's close enough to Cohen's h
# cohen.d(outcomevariable_numeric ~ predictorvariable_categorical)
cohen.d(cah3$ubi_support ~ cah3$gender)

In [None]:
# effect size - rsquared
ubi <- t.test(cah3$ubi_support ~ cah3$gender)
# the $statistic of a saved t-test object is the t-value.  The $parameter is the degrees of freedom
# rsquared of t-test is t-squared over t-squared plus df
rsq3 <- ubi$statistic^2 / (ubi$statistic^2 + ubi$parameter)
names(rsq3) <- "r-squared" # re-label the value from t to rsq
rsq3 # proportion
percent(rsq3, accuracy = .01) # percentage

In [None]:
## compare cramer v from chisq.test version
CramerV(ubitab)

The Cohen's d shows a small substantive difference in the means, however r-squared indicates that gender only explains 2% of the variation in support for UBI.  I also calculated Cramer's V for the version we ran using chisq.test; it also shows a small difference.  We can conclude that the difference is substantively significant, but that gender is not the best variable to explain the differences in support for UBI - perhaps there is a mediating or confounding variable.

[Return to Top](#top)
<a id = "median"></a>

## Test of Medians - non-parametric
We're going to test the difference in median of self-rating of attractiveness by education.  We might hypothesize that there will be a difference because higher educated individuals may not worry too much about how attractive they are. We'll use the non-parametric test because the attractive variable is ordinal.  

### Step 1 - Formulate Hypothesis
Instead of comparing means we're comparing medians.

$H_0: $ There is no difference in medians between the two groups.

$H_A: $ There is a significant difference in medians between the two groups

Before we start, we have to do a bit of data cleaning and then we'll visually inspect the distributions.
<a id = "box"></a>

In [None]:
cah4 <- cah %>% filter(educ != "DK/REF" & attractive != "DK/REF") %>% 
                mutate(attractive = replace(attractive, attractive == "Not attractive at all", "1")) %>% 
                mutate(attractive = replace(attractive, attractive == "Very attractive", "10")) %>% 
                mutate(attractive = as.numeric(attractive)) %>% 
                mutate(educ = fct_collapse(educ, 
                                           "Some College or Less" = c("High school or less", "Some college", "Other"),
                                           "College Degree or Higher" = c("College degree", "Graduate degree")))
summary(cah4$attractive)
table(cah4$educ)

In [None]:
cah4 %>%
  ggplot(aes(x=educ, y= attractive, fill=educ)) +
    geom_boxplot() +
    scale_fill_manual(values=c("#52C87d", "#26d5b8"))  +
    labs(y = "self-rating of attractiveness",
         x = "",
         title = "Distribution of self-rating of attractiveness by Gender",
         subtitle ="Means indicated in orange") +
    theme(legend.position = "none", text = bold.14.text) +
    # add dots that indicate the group means
    stat_summary(fun.y=mean, colour="#ff5733", geom="point", 
                 shape="circle", size=7) +
    # add text that indicate the group means
    stat_summary(fun.y=mean, colour="#ff5733", geom="text", aes(label = round(..y.., digits=1)), 
                  vjust=1, hjust = -0.2, size = 10)

In the boxplot it looks like there is little to no difference in median (or mean) of self-rated attractiveness by level of education.  But let's run our significance test to confirm.


### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> Yes

Sample is <10% of the population? ->>> Yes

Sampling distribution is normally distributed? ->>> Doesn't matter - we're running a non-parametric test

Are the variances of each sample equal? ->>> Doesn't matter for this test


### Step 3 - Calculate Mann-Whitney U

In [None]:
# the wilcox.test with paired = FALSE conducts Mann-Whitney
wilcox.test(cah4$attractive ~ cah4$educ, paired = FALSE)

#### Conclusion:
1. The p-value is greater than an alpha of 0.05, therefore we fail to reject the null hypothesis.  There is no difference in median by groups.
3. The result suggests that there is no difference in self-rated attractiveness by education level.

#### Unstandardized Effect Size
There appears to be zero difference in median and only a trivial difference in means.

#### Standardized Effect Size
The non-parametric test has no standardized effect like Cohen's d.  But we can treat it as if we were comparing the means and run Cohen's d and r-squared.  It's important to note that this effect size would refer to the test of means, not medians.

In [None]:
# use the cohen.d() function as it's close enough to Cohen's h
# cohen.d(outcomevariable_numeric ~ predictorvariable_categorical)
cohen.d(cah4$attractive ~ cah4$educ)

In [None]:
# effect size - rsquared
att <- t.test(cah4$attractive ~ cah4$educ)
# the $statistic of a saved t-test object is the t-value.  The $parameter is the degrees of freedom
# rsquared of t-test is t-squared over t-squared plus df
rsq4 <- att$statistic^2 / (att$statistic^2 + att$parameter)
names(rsq4) <- "r-squared" # re-label the value from t to rsq
rsq4 # proportion
percent(rsq4, accuracy = .01) # format proportion as percentage

The standardized effect size measures also support a conclusion that there is no substantive effect of education on self-rated attractiveness.

[Return to Top](#top)
<a id = "power"></a>

## Power Analysis

In the first example we found no statistically significant difference in belief of what the federal minimum wage is by race.  Let's see what the power of that analysis was, to see if we had enough power to limit our probability of Type II error.

**IMPORTANT:** <br>
**1. For a two-sample t-test the n (sample size) provided in the t-test should be the size of the smaller of the two groups, not the total sample size.** <BR>
**2. For a t-test the effect size in the pwr function MUST be Cohen's d.  Do not try to run this with r-squared, you will not get the right result.**

In [None]:
es_d <- cohen.d(cah1$fed_min_is ~ cah1$race, var.equal = TRUE)$estimate
small_n <- min(table(cah1$race)) #min of a one-way table of the grouping variable gives us the size of the smallest group

pwr.t.test(n = small_n, d = es_d, sig.level = 0.05, power = NULL, alternative = "two.sided")

We see that for our sample size and the very small effect size we have only a power of 0.179.  This means we have an 82% chance of Type II error.  What sample size would we need to have a power of 0.8, and therefore only a 20% chance of Type II error?

In [None]:
pwr.t.test(n = NULL, d = es_d, sig.level = 0.05, power = 0.8, alternative = "two.sided")

We would need 954 observations **in each group** to have a power of 0.8 with a Cohen's d of only 0.13.  Note, if we want to increase the size of the effect we can detect, we would need even more observations.

[Return to Top](#top)
<a id="viz"></a>

## Varieties of Visualizations
We have already seen three different ways to visualize differences in distributions, means, proportions, or medians by two groups throughout this lab:

- [geom_density](#density): The shape of the distributions - used to show overall difference in distributions.
- [geom_bar](#bar): used when we have proportions to show difference in proportions with 95% CI error bar.
- [geom_boxplot](#box): used to show difference in distributions, medians, and IQR among groups.  Can also add indicator for mean.

There are a variety of other ways to visually display this information, and the best type of visualization can depend on the nature of your data.

1) A **histogram** of the distribution by group.  In this case it shows how non-normal the self-rated attractiveness data was that necessitated the use of the non-parametric test.  But this is a universally acceptable visualization to compare two distributions.

In [None]:
cah5 <- cah4 %>% mutate(educ = fct_relevel(educ, rev))
cah5 %>%
  ggplot(aes(x=attractive, fill=educ)) +
    ## bins indicates the number of bars to break the distribution into
    ## alpha indicates the level of transparency of the bars (To see the bars behind each other)
    geom_histogram(bins = 15, alpha = 0.5, position = "identity") +
    scale_fill_manual(values=c("#FF5733", "#26d5b8"))  +
    labs(fill="Education",
         y = "Frequency",
         x = "self-rated attractiveness",
         title = "Distribution of self-rated attractiveness by education level") +
    theme(legend.position = "bottom", text = bold.14.text) 

2) A **"violin" plot** that shows the both the density of the distribution AND a box plot, by group.

In [None]:
v <- cah4 %>% ggplot(aes(x = educ, y = attractive, fill = educ)) + 
            geom_violin() +
            geom_boxplot(width=0.1, fill = "black", color = "white", size = 2)+
            scale_fill_manual(values=c("#FF5733", "#26d5b8"))  +
            labs(fill="Education",
                 y = "self-rated attractiveness",
                 x = "",
                 title = "Distribution of self-rated attractiveness",
                 subtitle = "By level of education") +
            theme(legend.position = "none", text = bold.14.text) +
            coord_flip()
v

3) A **mirrored density plot**, which works particularly well for paired t-tests.  You don't even have to transform the data to long form first!

In [None]:
# this one takes a few seconds to run...

p <- ggplot(cah2, aes(x=x) ) +
  # Top
  geom_density( aes(x = fed_min_is, y = ..density..), fill="#26d5b8" ) +
  geom_label( aes(x=28, y=0.05, label="Federal Minimum Wage is"), color="#26d5b8", size = 6, fill = "black") +
  # Bottom
  geom_density( aes(x = fed_min_should, y = -..density..), fill= "#ff5733") +
  geom_label( aes(x=27, y=-0.05, label="Federal Minimum Wage should be"), color="#ff5733", size = 6, fill = "black") +
  xlab("Hourly Wage")+
  labs(title = "Distribution of Federal Minimum Wage",
       subtitle = "What they believe it is vs. what they believe it should be") +
  scale_x_continuous(labels = dollar) +
  theme(text = bold.14.text)
p



[Return to Top](#top)
<a id="prac"></a>

## Practice Problem - Your Turn!

You will look at the variable `fed_tax_is` which is the answers to the question - "If you had to guess, in percentage, what do you believe the federal tax rate is for individuals making more than 500 thousand dollars per year?"  You will see if the mean believed federal tax rate differs by level of `educ`.  

- Start with the df `cah` and remove NA values on this variable and any observations with values larger than 100.
- Factor and collapse `educ`.  You can see the fourth example in this notebook for the code to collapse educ levels into only 2 groups
- Graph a visualization of the difference in distributions
- Determine if the mean of guesses is significantly different by education level.
- Determine if the result is substantively significant - looking both at the unstandardized and standardized effect sizes.