# Two-Way ANOVA
<a id = "top"></a>
This lab will take you through two complete examples of conducting two-way ANOVA analyses using the Cards Against Humanity poll data we've used in previous labs.  Each example will go through the process of preliminary data inspection/visualization, checking assumptions, conducting the ANOVA analysis, interpretation, post-hoc assumption checking, post-hoc pairwise comparisons, and effect size calculation and interpretation.  

We will also look at a few special cases - paired data and two-way ANOVA without interactions.

### Table of Contents
- [Example 1: Income by Education Level and Race](#ex1)
    - [Preliminary Inspection](#prelim1)
    - [Checking Assumptions](#assump1)
    - [ANOVA Analysis](#anova1)
    - [Post-hoc Assumption Check](#postchk1)
    - [Pairwise Comparisons (Tukey HSD)](#pair1)
    - [Effect Size](#eff1)
- [Example 2: Attractiveness by Age Range and Gender](#ex2)
    - [Preliminary Inspection](#prelim2)
    - [Checking Assumptions](#assump2)
    - [ANOVA Analysis](#anova2)
    - [Post-hoc Assumption Check](#postchk2)
    - [Pairwise Comparisons (Bonferroni)](#pair2)
    - [Effect Size](#eff2)
- [Special Case](#special)
    - [Paired Measures - the element of time](#paired)
- [PQ format](#pqform)

In [None]:
# LIBRARIES
library(tidyverse)
library(magrittr) ## for pipe operators
library(pwr) ## for power function and ES.h (Cohen's h)
library(scales) ## for scaling functions for ggplot2
library(effsize) ## for Cohen's D
library(DescTools) ## for non-parametric tests
library(rcompanion) # for EpsilonSquared function
library(gridExtra) # for side-by-side plots
library(knitr) 
library(kableExtra) # to make kables
library(gt) # new package for PQ tables

## bold text specification for ggplot
bold.14.text <- element_text(face = "bold", size = 14)

options(repr.plot.width=14, repr.plot.height=5) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this.

In [None]:
## LOAD the DATA
cah <- read_csv("201806-CAH_PulseOfTheNation_Raw.csv")
## variable names currently full questions - need to rename
new_names <- c("gender", "age", "agerange", "race", "income", "educ", "partyid", "polaffil", 
               "trump", "hollymoney", "fed_min_is", "fed_min_should", "fed_tax_is", "fed_tax_should", 
               "redist", "redist_you", "redist_people", "baseincome", "faircomp", "ceofair", "attractive")
colnames(cah) <- new_names
glimpse(cah)

In [None]:
#question text
spec(cah)

<a id = "ex1"></a>
## Example 1: Income by Education Level and Race

Again we're going to use data from a Cards Against Humanity poll, this time from June 2018.  For the first example we're going to see if the mean of income differs by education level.  We have strong reason to believe this might be true - higher education is generally related to obtaining a higher paying job.  For this lab we're going to also look at the impact of race - which has been previously found to be related to income as well. 

Before we start we need to do a bit of data cleaning on our three variables.  Note - I'm not removing NA for variables I'm not using - this is important to maintain the power of your analysis.

In [None]:
## data cleaning for income and education level
cah1 <- cah %>% drop_na(income) %>% 
                filter(!educ %in% c("DK/REF", "Other")) %>% 
                mutate(educ = fct_relevel(educ, "High school or less", "Some college", "College degree", "Graduate degree"))  %>% 
                mutate(race = fct_lump(race))
summary(cah1$income)
cah1 %$% table(educ, race)

<a id = "prelim1"></a>
### Preliminary Inspection
The first part of any analysis is to inspect and visualize your data so that you know what you're working with.  We've looked at basic summary data in the code above when we did the data cleaning.  I looked at the two-way table of the two categorical variables to make sure I had observations in each of the cells.  If there are are not observations in each of the cells in the two-way table of the two categorical predictors, we cannot include an interaction effect in the ANOVA model.  We're good though, see above.

Next, we should look at a couple of graphs and get a feel for what the distribution of income looks like within and between education levels AND race.  

First, I'm going to look at a density plot to see what the distributions look like - looking for things like normality and skewness as well as seeing how much the distributions overlap.  Similar to in the lecture, I'm going to use grid.arrange() to show my two plots side by side in order to more easily compare.

In [None]:
## density plot
den_educ <- cah1 %>%
  ggplot( aes(x=income/1000, fill=educ)) +  ## divide income by 1000 to make the axes tick marks more readable.
    geom_density(alpha=0.6) +
    scale_fill_manual(values=c("#26d5b8", "#ff5733", "magenta", "blue")) +
    labs(fill= "Education Level",
         y = "Density",
         x = "Income in $1000s",
         title = "Distribution of Income by Education Level")

den_race <- cah1 %>%
  ggplot( aes(x=income/1000, fill=race)) +  ## divide income by 1000 to make the axes tick marks more readable.
    geom_density(alpha=0.6) +
    scale_fill_manual(values=c("#26d5b8", "#ff5733")) +
    labs(fill= "Race/ethnicity",
         y = "Density",
         x = "Income in $1000s",
         title = "Distribution of Income by Race/Ethnicity")

grid.arrange(den_educ, den_race, ncol = 2)

It's clear that there is some difference between the groups, but it's hard to see because of the extreme outliers.  Let's drop those out of the graph for now so we can better visualize what's going on in the bulk of the distributions.

In [None]:
## density plot
cah_income <- cah1 %>% filter(income < 200000) %>% mutate(educ = fct_relevel(educ, rev))

den_educ2 <- cah_income %>% 
  ggplot( aes(x=income/1000, fill=educ)) +  ## divide income by 1000 to make the axes tick marks more readable.
    geom_density(alpha=0.6) +
    labs(fill= "Education Level",
         y = "Density",
         x = "Income in $1000s",
         title = "Distribution of Income by Education Level")

den_race2 <- cah_income %>%
  ggplot( aes(x=income/1000, fill=race)) +  ## divide income by 1000 to make the axes tick marks more readable.
    geom_density(alpha=0.6) +
    scale_fill_manual(values=c("#26d5b8", "#ff5733")) +
    labs(fill= "Race/ethnicity",
         y = "Density",
         x = "Income in $1000s",
         title = "Distribution of Income by Race/Ethnicity")

grid.arrange(den_educ2, den_race2, ncol = 2)

I reviewed the density plot for education in the last lab - so you can review that for a more detailed interpretation.

For race, contrary to what one might expect due to previous research, it appears that non-White individuals in our sample might have a higher average income - based on the lower peak between 0-50k and the fatter upper tail.

It's hard to determine the location of the mean/median in this view, so lets also make a boxplot to look at that information.

In [None]:
#boxplot
box_educ <- cah1 %>% filter(income < 200000) %>% 
  ggplot(aes(y=income/1000, x=educ, fill=educ)) +  ## divide income by 1000 to make the axes tick marks more readable.
    geom_boxplot() +
    stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., color = "mean"),
                 width = 0.75, linetype = "solid", size = 2) +  ## this one adds the line that indicates the group mean.
    scale_fill_manual(values=c("#26d5b8", "#ff5733", "magenta", "blue")) +
    scale_color_manual(values = "#39ff14")+ ## makes the group mean line green - doing it this way forces it to show in the legend
    labs(fill= "Education Level",
         y = "Income in $1000s",
         x = "Education Level",
         color = "Group Mean",
         title = "Distribution of Income by Education Level")

box_race <- cah1 %>% filter(income < 200000) %>% 
  ggplot(aes(y=income/1000, x=race, fill=race)) +  ## divide income by 1000 to make the axes tick marks more readable.
    geom_boxplot() +
    stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., color = "mean"),
                 width = 0.75, linetype = "solid", size = 2) +  ## this one adds the line that indicates the group mean.
    scale_fill_manual(values=c("#26d5b8", "#ff5733")) +
    scale_color_manual(values = "#39ff14")+ ## makes the group mean line green - doing it this way forces it to show in the legend
    labs(fill= "Race/Ethnicity",
         y = "Income in $1000s",
         x = "Race/Ethnicity",
         color = "Group Mean",
         title = "Distribution of Income by Race/Ethnicity")

grid.arrange(box_educ, box_race, ncol = 2)

Again, I limited this to not include any income outliers - any responses greater than 200k.  

I looked in depth at the educ boxplot in Lab 6 - so refer back to that for the interpretation.

As for differences in income by race, it does appear that the mean/median of income for the "Other" group vs. the "White" group is slightly higher (but perhaps not significantly so).  The spread of the distribution (IQR) also appears to be similar, indicating that we may not violate the assumption of equal variance, however we still have to formally test that. 

Finally, since we now have two variables, we should look at potential interactions.  Even though race may not be a significant predictor by itself (main effect) it could have a significant interaction with education as a predictor for income.

In [None]:
options(repr.plot.width=10, repr.plot.height=5) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this.


cah_income %>% 
  ggplot(aes(x = race, color = educ, group = educ, y = income)) +
  stat_summary(fun.y = mean, geom = "point", size = 5) +
  stat_summary(fun.y = mean, geom = "line", size = 2) +
  labs(title = "Average Income by Race and Education",
       x = "Race/Ethnicity",
       y = "Income in $1000s",
       color = "Education") +
  theme(text = bold.14.text)

This is potentially interesting.  The lines for Graduate degree, College Degree, and Some college are all roughly parallel and therefore do not suggest an interaction, however, High school or less intersects with Some college - suggesting a potential interaction effect there.  We will still need to see if it's large enough to be statistically significant.

[Return to Top](#top)
<a id = "assump1"></a>
### Checking Assumptions
ANOVA analysis has 6 assumptions, 5 of which we can review/check before our analysis.

The basic ones we don't need to test but just need to confirm (possibly through looking at survey/data documentation):
1. Dependent variable is numeric - **income is numeric**
2. Group sample sizes are approximately equal - **if you look back at when we did our data cleaning, each education group had roughly 100 observations, race groups are not as equal, but are close enough in magnitude to ignore**
3. Independence of observations - **the observations come from independent respondents who were randomly selected**
4. No extreme outliers - **We do have extreme outliers!  We will remove them from the data moving forward so that we do not violate this assumption (and so they don't unduly bias our results)**

In [None]:
# remove outliers
cah1 %<>% filter(income < 200000)
summary(cah1$income)

Our last pre-check item:
5. Homogeneity of variance - the within group variance for each of the groups should be equal. **We will check this with Levene's Test**

The statistical hypotheses for Levene's Test are:

$H_0:$ The variances in the groups are equal. <BR>
$H_A:$ The variances in the groups are not equal.

In this test, we sort of want to fail to reject null, because it's easier if our variances are equal and we don't need to make the adjustment.

REMEMBER: This is a pre-test to check this assumption.  This is NOT your ANOVA analysis!!!!

We need to check for each predictor.

In [None]:
#LeveneTest(DV ~ IV, data = your data frame)

LeveneTest(income ~ educ, data = cah1)
LeveneTest(income ~ race, data = cah1)
LeveneTest(income ~ educ:race, data = cah1) ## interaction groups

We now need to interpret each test:

1. Education: As we may have expected from reviewing our density graph and boxplots, the within group variances are not equal.   Because the p-value of Levene's Test is less than alpha = 0.05, that means that **we are violating the assumption of homogeneity of variance.**
2. Race: The p-value is higher than alpha, meaning that we fail to reject the null hypothesis.  This is what we expected based on the boxplot. **we are NOT violating the assumption of homogeneity of variance for this variable.
3. Interaction: the p-value is less than alpha, so we reject the null hypothesis.  This is probably largely driven by the differences in variance for education, not for race. **we are violating the assumption of homogeneity of variance.**

While overall we are violating the assumption, the function we would use `oneway.test()` to address this is not as robust as `aov()` in terms of what we can do post-analysis, therefore we will proceed using `aov()` with the acknowledgement that we ARE violating this assumption.

**IMPORTANT**: This is a test of variance, but this is **NOT** the ANOVA test.  This just tests the assumption of homogeneity of variances.  We cannot use these results to make inference about our means.

We cannot test our last assumption until after our analysis, so let's move forward!

[Return to Top](#top)
<a id = "anova1"></a>
### ANOVA Analysis
We've checked off our pre-flight to do list, now we're ready for lift off!  And R makes it easy to conduct an ANOVA analysis - we just need one line of code!

Up front, let's review our statistical hypotheses - we have three sets - one for each of the F-tests we'll have in our model.

Education:

$H_0:$ There is no difference in average income between education levels. <BR>
$H_A:$ There is at least one significant difference between average income by education level.
    
Race: 

$H_0:$ There is no difference in average income between races. <BR>
$H_A:$ There is at least one significant difference between races.
    
Interaction:

$H_0:$  There is no significant interaction between race and education in predicting income.<BR>
$H_A:$ There is a significant interaction between race and education.  At least one of the groups that comprises the intersection of race and education has a significantly different average income than the others.

In [None]:
# aov(DV ~ IV, data = yourdf)
aov(income ~ educ*race, data = cah1)  

# educ*race asks for both main effects and the interaction
# educ + race would ask only for the main effects
# educ:race is how we could specify the interaction by itself.

**Wait! Where are our results?**

By default, this is the output of `aov()` which is not very informative.  To get the "good stuff" we need to call the `summary()` function on the aov object.  We can do this by wrapping the entire call in summary, or by saving the result of calling aov to an object that you then pass to `summary()`

In [None]:
# get full results of aov
race_educ_aov <- aov(income ~ educ*race, data = cah1)
summary(race_educ_aov)

Interesting -

1. With our p-value less than alpha for our first F-test (the main effect of education) we conclude that education IS a significant predictor of income.  This is the same as our main effect in our one-way ANOVA (that included only education as the predictor).
2. With p = 0.036 for the second F=test (the main effect of race) we conclude that race IS a significant predictor of income.  This is surprising given the small difference we saw on the boxplot, however that small difference was large enough to surpass the threshold for statistical significance.
3. The third F-test, for the interaction between education and race, is not significant (p = 0.520).  With a p-value so close to alpha, some researchers might consider this a "marginally significant" result.  We did see evidence of a potential interaction on our interaction plot, so it may be the case that we don't have enough power to detect this effect at alpha =0.05.  The power is based on the group sizes, so the groups defined by the intersection of the two variables - White:HS Education, Other:HS Education, White:Some College, Other:Some College, etc. might not be large enough to accurately detect this effect.

[Return to Top](#top)
<a id = "#postchk1"></a>

### Post-hoc Assumption Check
We need to check one final assumption - the normality of the residuals.  Because the residuals are calculated during the ANOVA analysis (they're what make up SSW - or the residual sum of squares), we cannot check this assumption prior to the analysis.

6. Normality of Residuals - **We check this via a QQ plot of the _residuals_ from our data.**

You can run `str()` on your `aov()` result and see that inside that object are saved a number of different pieces that you can extract using $ indexing.  One of these pieces is the vector of residuals.  We can use these to create a QQ plot, in the same way we have previously created QQ plots, except this time it's of the residuals, not the observations.

In [None]:
## need to save residuals as a df so that we can use ggplot.
## we're getting the residuals from our previously saved aov object (See previous code block)
resid_df <- data.frame(resid = race_educ_aov$residuals) ## the residuals part of the aov results using $residuals

resid_df %>% ggplot(aes(sample = resid)) +
  geom_qq_line(color = "red", size = 1) +
  geom_qq(color = "black") +
  labs(title = "QQ Plot of Residuals")

Our residuals are "whatever's left" in terms of variance after all of the main and interaction effects are calculated.

Here it appears that our residuals are somewhat normally distributed.  There is a small amount of deviation from normality in the lower tail, and a more extreme amount of deviation from normality in the upper tail.

Because the deviations are both on the top side of the reference line it's indicative of the right skew we can see in our density plot of the observations.  That right skew carries over into our residuals.  We can plot a density plot of the residuals if we're curious. (not necessary)

Remember, the residuals are the individual observation deviations from the GROUP means.  They are in dollars (the same unit as the observations).

In [None]:
# not necessary density plot of residuals.
# quick and "dirty" - not PQ

resid_df %>% ggplot( aes(x=resid)) +  
    geom_density(fill = "blue") 

This confirms the right skew - notice where 0 (the mean of the residuals) sits on the x axis.

[Return to Top](#top)
<a id = "pair1"></a>
### Pairwise Comparions (Tukey HSD)
Now we get to the fun part - it's time to figure out which group means (education levels) actually significantly differ from the others, through pairwise t-tests.  Recall that we need to use special procedures for these pairwise t-tests to adjust for the multiple comparisons problem - the inflation of Type I error that comes from repeatedly conducting statistical tests on the same data.  In this first example I will show Tukey HSD.  It's harder to do bonferroni on two-way anova (although possible) so usually use Tukey (it's just easier).

You'll get three sets of pairwise comparisons - one for the differences between levels of education, one for the difference between the two race groups (which is essentially a t-test - there are only two groups so one comparison), and one for the groups that are defined by the insersection of educ:race.

In [None]:
# TukeyHSD() with saved aov() object - we saved this in the ANOVA analysis section above.
TukeyHSD(race_educ_aov)

The output shows the difference between the group means, the lower and upper bound of the 95% CI for that _difference_ and the p-value (adjusted for the multiple comparisons).  The first column shows you the order of the subtraction done in the numerator in the t-test, so the sign of diff can show you the direction of the difference.  

We get A LOT of pairwise comparisons - especially for the interactions.

Starting at the top, we get the pairs of the different groups of education, which is the same output we saw in the last lab.

Next, we see the one pair for race (because we only have two groups - white and other).  That is significant and has the same p-value as the F-test in the ANOVA model.

Finally we see all of the pairwise comparisons for educ:race. Even though our overall F-test was not significant, some of the pairs are significantly different.  This may be due to that lack of power I discussed previously.

[Return to Top](#top)
<a id = "eff1"></a>
### Effect Size
We now need to look at the magnitude of these differences to see if they're substantively significant on top of being statistically significant.

#### Unstandardized Effect Size
Unstandardized Effect Size is always the difference between the means in the units of the observations.  Because it's in the units of the observations it's unstandardized - which means we can't compare between different analyses - we can't compare $10,000 to a difference of 2 in attractiveness on a range from 1-10.  Just because 2 is much smaller than 10,000 doesn't mean that the magnitude of the difference is any less.

Here we can look at the difference in means from our Tukey HSD output.

I discussed the differences in income by education in the last lab - I concluded that they were substantively significant.

The difference by race - $8k, is probably not too large in terms of annual income and at the most would be a very small substantive difference, if not negligable.

For the interactions, the magnitude of the differences between the pairs range from as low as about 40 dollars (not at all substantive) to as high as about $60k (very substantive).

Since our unstandardized effects are all over the place, let's turn to some standardized effect sizes that can tell us overall about our entire model.

#### Standardized Effect Size - R-squared.
Unlike in previous statistical tests (where we used Cramer's V and Cohen's d), r-squared doesn't tell us about the magnitude of the differences.  It tells us how much variance income can be explained by THE MODEL.  Or in other words - is the combination of education, race, and the interaction substantive predictors of income?  An IV can be a significant predictor of the outcome while also not accounting for much variance in the outcome.

The formula for r-squared becomes more complicated when we have multiple predictors, so I'm going to use the linear regression function (I told you ANOVA is just a special case of linear regression!) and extract that piece of information.

In [None]:
# calculate r-squared
er_lm <- lm(income ~ race*educ, data = cah1)
rsq <- summary(er_lm)$r.squared 
rsq # proportion
percent(rsq, accuracy = .01) # percentage

The overall model (the combination of educ, race AND educ:race) explains about 16% of the variance in income.  This is a decent amount of variance explained, which means that these variables are good predictors of income.  However, we had previously seen in the one-way ANOVA lab that education by itself had an r-squared of 13%.  So race (and the interaction term) may not be adding much to the model.  We can look at partial-Eta-squared to see which variable(s) explain the most variance 

In [None]:
eta2 <- EtaSq(race_educ_aov) ## give eta-squared the saved anova output
eta2 # print the entire eta-squared output

Here we interpret the last column - the partial eta-squared.

As we saw in the previous lab - by itself education explains 13% of the variance in income.  Race only explains about 1% and the interaction only explains about 2%.  So education is the strongest predictor of income in the overall model (out of the overall r-squared of about 16%).

#### Cohen's $f$
The other effect size statistic we will use is Cohen's $f$.  Cohen's $f$ is primarily needed because it is the effect size used in power calculations. Cohen's $f$ can be interpreted similarly to Cohen's d (with the rule of thumb cutoffs), but is now the averaged magnitude of the differences in means (because now we have many pairwise differences). 

R-squared is preferred for "interpretation" purposes.  Cohen's $f$ is calculated using the r-squared value.  And Cohen's $f$ is **required** for the power analysis.  You cannot use r-squared as the effect size in the `pwr()` function.

## $f = \sqrt{\frac{r^2}{1 - r^2}}$

In [None]:
## calculate cohen's f using saved value of rsq
cohenf <- sqrt(rsq / (1-rsq))
cohenf

The Cohen's $f$ is approximately 0.4.  This is between the small value (0.2) and a medium value (0.5), so I could consider it small/medium (or smedium in Adrianne-speak). 

Overall, I'd conclude that the difference in average income by education and race is both significant and substantive.

Let's move on to our second example

[Return to Top](#top)
<a id = "ex2"></a>

## Example 2: Attractiveness by Age Range and Gender
This example will not have as much discussion of the concepts and will be more focused on doing the analysis and the interpretation of the analysis.  If you need more information on the concepts definitely look at Example 1.  

In this example we'll also use the same CAH poll data, but we'll look at different different variables.  Attractiveness, which is a 1-10 self-rating of one's attractiveness (_This next question is about your physical appearance, and you may choose not to respond if it makes you uncomfortable. On a scale of 1-10, how physically attractive are you?_) for the outcome/numerical variable, and Age range (categorical age) and gender for the predictors.  We seek to see if age and/or gender influences how attractive people think they are.

Attractiveness is a rating on 1-10, so may be considered more ordinal than numerical, but it's got a big enough range to do numerical analysis using it.

Throughout this example I'll use some alternative ways of doing things (different types of graphs, the other type of pairwise tests), so make sure you review both examples.

In [None]:
## data cleaning
cah2 <- cah %>% filter(attractive != "DK/REF" & !gender %in% c("DK/REF", "Other") ) %>% 
                mutate(attractive = replace(attractive, attractive == "Not attractive at all", "1")) %>% 
                mutate(attractive = replace(attractive, attractive == "Very attractive", "10")) %>% 
                mutate(attractive = as.numeric(attractive), agerange = factor(agerange)) %>%
                # collapse agerange groups so they're more equal in size when cut by gender
                mutate(agerange = fct_collapse(agerange, 
                                                "18-44" = c("18-24", "25-34", "35-44"),
                                                "45-64" = c("45-54", "55-64"),
                                                "65+" = "65+"))
summary(cah2$attractive)
cah2  %$% table(agerange, gender)

[Return to Top](#top)
<a id = "prelim2"></a>
### Preliminary Inspection
Our first part is always preliminary inspection.  This time I'm going to use a violin plot that shows both the density and the boxplot on one graph.

In [None]:
options(repr.plot.width=14, repr.plot.height=5) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this.

viol1 <- cah2 %>% ggplot(aes(x = agerange, y = attractive, fill = agerange)) + 
            geom_violin() +
            geom_boxplot(width=0.1, fill = "white", color = "black", size = 1)+
            stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., color = "mean"),
                 width = 0.75, linetype = "solid", size = 2) +
            scale_color_manual(values = "#39ff14")+
            labs(fill="Age Range",
                 y = "self-rated attractiveness",
                 x = "",
                 title = "Distribution of self-rated attractiveness by Age Range",
                 color = "Group Mean") +
            theme(legend.position = "bottom", text = bold.14.text) +
            ylim(0,10)

viol2 <- cah2 %>% ggplot(aes(x = gender, y = attractive, fill = gender)) + 
            geom_violin() +
            geom_boxplot(width=0.1, fill = "white", color = "black", size = 1)+
            stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., color = "mean"),
                 width = 0.75, linetype = "solid", size = 2) +
            scale_color_manual(values = "#39ff14")+
            labs(fill="Gender",
                 y = "self-rated attractiveness",
                 x = "",
                 title = "Distribution of self-rated attractiveness by Gender",
                 color = "Group Mean") +
            theme(legend.position = "bottom", text = bold.14.text) +
            ylim(0,10)

grid.arrange(viol1, viol2, ncol = 2)

First, we'll look at the density part of the graph - the colored "violin shape" - which is the same thing as our density graph, but mirrored on either side of the box plot (which is why it's thought to look like a violin, I'll leave to your own imagination what it may actually look like...).  For age, it appears like the distribution of self-rated attractiveness is most normal among the youngest group, and bimodel and skewed in the other two.  Both distributions by gender are bimodel, with many observations concentrated around 5 and 7.5.

Moving to the boxplot portion of the graph, I've also added the neon green line here to indicate the group mean.  There seems to be a small difference in mean self-rated attractiveness by age - the scores lowering as age increases.  There does not seem to be a sizeable difference in average self-rated attractiveness by gender.  

We still should check to see if there are any apparent interactions.

In [None]:
options(repr.plot.width=10, repr.plot.height=5) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this.


cah2 %>% 
  ggplot(aes(x = gender, color = agerange, group = agerange, y = attractive)) +
  stat_summary(fun.y = mean, geom = "point", size = 5) +
  stat_summary(fun.y = mean, geom = "line", size = 2) +
  labs(title = "Self-rated attractiveness by gender and age range",
       x = "Gender",
       y = "self-rated attractiveness",
       color = "Age Range") +
  theme(text = bold.14.text)

There may potentially be a significant interaction here.  This is evident in the difference between the red line and the other two lines - they are not parallel.  This suggests an interaction even though they don't intersect.

What we see is different behavior among men and women.  In the lower two age groups, the average self-rated attractiveness of men are very close (not substantially different), however there is a HUGE difference between average self-rated attractiveness between 18-44y/o women and 45-64 y/o women.

Just looking at the blue and green lines, there is no interaction here - they are parallel.  This means that the effect of gender on self-rated attractiveness does not vary in those two age groups.

[Return to Top](#top)
<a id = "assump2"></a>
### Checking Assumptions
ANOVA analysis has 6 assumptions, 5 of which we can review/check before our analysis.

The basic ones we don't need to test but just need to confirm (possibly through looking at survey/data documentation):
1. Dependent variable is numeric - **it would best be considered ordinal, but it has a large enough range (1-10) to treat it as numerical in analysis**
2. Group sample sizes are approximately equal - **the group sample sizes range from ~50 to ~160. They are similar in magnitude.  I accomplished this by collapsing some of the age ranges together to create wider ranges per factor level.**
3. Independence of observations - **the observations come from independent respondents who were randomly selected**
4. No extreme outliers - **We have some skewness, but no observations that would be considered extreme**

And our pre-check assumption we'll run in R:
5. Homogeneity of variance - the within group variance for each of the groups should be equal. **We will check this with Levene's Test**

In [None]:
#LeveneTest(DV ~ IV, data = your data frame)
LeveneTest(attractive ~ agerange, data = cah2)
LeveneTest(attractive ~ factor(gender), data = cah2)
LeveneTest(attractive ~ agerange:gender, data = cah2)

For the first test - agerange, we reject null, indicating that the within group variances are NOT equal.
For the second test - gender, we fail to reject null, so the within group variances are not significantly different. (do not violate the assumption of homogeneity of variance).
The last test - for the interaction, is also significant, therefore violates the assumption.

Our last assumption check (normality of residuals) will wait until after we run our ANOVA model.

[Return to Top](#top)
<a id = "anova2"></a>
### ANOVA Analysis
What are our statistical hypotheses this time?

Again, we have three, one for each F-test.  We have two main effects, agerange and gender, and one interaction effect.

$H_0:$ There is no difference in average self-rated attractiveness between age groups. <BR>
$H_A:$ There is at least one significant difference between average self-rated attractiveness by age groups.
    
$H_0:$ There is no difference in average self-rated attractiveness between genders. <BR>
$H_A:$ There is at least one significant difference between average self-rated attractiveness by gender.
    
$H_0:$ There is no difference in average self-rated attractiveness between age:gender combinations. <BR>
$H_A:$ There is at least one significant difference in means - there is a significant interaction between age and gender.

In [None]:
## remember we want to save our result object (aov object) because we'll need it later.
attract_aov <- aov(attractive ~ agerange*gender, data = cah2)
summary(attract_aov)

We only have one signifignant F-test - the one for the main effect of age range.  So self-rated attractiveness differs by age range - at least one age range is significantly different from the others.  The main effect of gender and the interaction between age and gender are not significant (p greater than alpha).  So these are not significant predictors of self-rated attractiveness.

We don't know which groups among agerange significantly differ from this output, however.  We'll address that in a minute with the pairwise comparisons.  In the meantime we need to check our one post-hoc assumption.

[Return to Top](#top)
<a id = "#postchk2"></a>

### Post-hoc Assumption Check
We need to check one final assumption - the normality of the residuals.  Because the residuals are calculated during the ANOVA analysis (they're what make up SSW - or the residual sum of squares), we cannot check this assumption prior to the analysis.

6. Normality of Residuals - **We check this via a QQ plot of the _residuals_ from our data.**

In [None]:
## need to save residuals as a df so that we can use ggplot.
## we're getting the residuals from our previously saved aov object (See previous code block)
resid_df2 <- data.frame(resid = attract_aov$residuals) ## the residuals part of the aov results using $residuals

resid_df2 %>% ggplot(aes(sample = resid)) +
  geom_qq_line(color = "red", size = 1) +
  geom_qq(color = "black") +
  labs(title = "QQ Plot of Residuals")

Here the distribution of the residuals appears to be approximately normal, with some deviation in the upper tail.  This could be due to under-dispersion of data - see this for explanation of interpretation of QQ plots http://www.ucd.ie/ecomodel/Resources/QQplots_WebVersion.html

Overall, the residuals for this result is fairly normal, despite that bit of deviation.

[Return to Top](#top)
<a id = "pair2"></a>
### Pairwise Comparions (Bonferroni Adjustment)
Now we'll look to see which age groups are significantly different than each other on self-rated attractiveness.  Since I showed an example of Tukey HSD last time, I'll show the Bonferroni Adjustment this time.

Recall the the Bonferroni Adjustment conducts the t-test as normal, but adjusts alpha by dividing the overall alpha by dividing by the number of comparisons.  In the case of our code output, the adjustment will be made to the p-value.

Because of the way that the bonferroni test works, we have to run each predictor (agerange, gender, agerange:gender) separately.

In [None]:
# for bonferroni we use the function pairwise.t.test with the p.adj argument set to "bonf"
# pairwise.t.test(outcome, predictor, p.adj = "bonf")
cah2 %$% pairwise.t.test(attractive, agerange, p.adj = "bonf")

Two of the three pairwise differences are not significant.  Average self-rated attractiveness only significantly differs between the lowest and the highest age group.

In [None]:
cah2 %$% pairwise.t.test(attractive, gender, p.adj = "bonf")

We can run the pairwise tests of Gender, but there's only one pair, so it's the same information we got from the F-test above.

It's difficult to look at pairwise tests of interactions in this way, so let's also run Tukey HSD

In [None]:
# TukeyHSD() with saved aov() object - we saved this in the ANOVA analysis section above.
TukeyHSD(attract_aov)

The only pairwise interaction that approaches significance is between the oldest females and the youngest females.

If you remember, none of our pairwise comparisons between age ranges were significant in the last lab.  By collapsing the groups and accounting for gender in the model, the results have changed.

Looking at our effect size, especially our r-squared value, will tell us if age group and gender have any predictive power for self-rated attractiveness substantively.

[Return to Top](#top)
<a id = "eff2"></a>
### Effect Size
We now need to look at the magnitude of these differences to see if they're substantively significant.

#### Unstandardized Effect Size
Looking at the Tukey output, we see that the range of differences in mean self-rated attractiveness between groups ranges from practically 0 to about 0.6pt (on our 1-10 scale).  I would consider 1pt on a 1-10 scale to be a substantial difference, it amounts to 1/10th of the overall scale, however this is about a half of a point, and may not be substantial.

#### R-squared
We'll take the short cut to calculate r-squared through running the model as a linear regression.

In [None]:
# calculate r-squared
# calculate r-squared
ag_lm <- lm(attractive ~ agerange*gender, data = cah2)
rsq2 <- summary(ag_lm)$r.squared 
rsq2 # proportion
percent(rsq2, accuracy = .01) # percentage

The overall model (combination of agerange, gender, and the interaction of those variables) explains only 1.5% of the variance in self-rated attractiveness, which is negligable.  This is even lower than the 2% we saw last week with only agerange because we collapsed the age groups.

Because the r-squared is so low, I won't bother looking at partial eta-squared (the variance explained by each variable separately) but you should look at the first example for an outline of that process.

#### Cohen's $f$
Finally, we can look at Cohen's $f$ just to get the full picture of the magnitude of the effect.

In [None]:
## calculate cohen's f using saved value of rsq
cohenf2 <- sqrt(rsq2 / (1-rsq2))
cohenf2

Our Cohen's $f$ is approximately 0.12, which corresponds to between negligable and small on the rule of thumb scale (same as the one we used with Cohen's d.  So again supporting the conclusion that the analysis is not substantively significant.  And, probably only marginally statistically signficant.

Let's move to some special cases.

[Return to Top](#top)
<a id = "special"></a>

## Special Case
Sometimes when our data doesn't adhere to our assumptions, there are alternative types of ANOVA analyses we can use.  

[Return to Top](#top)
<a id = "paired"></a>

### Paired (Repeated) Measures - the element of time
As we briefly looked at in t-tests, instead of our two samples coming from two different levels of a categorical variable, they could be the same measure taken in the same group of people at two different TIME periods.  

***THIS IS MORE ADVANCED THAN SOME OF THE OTHER MATERIAL - IT IS NOT REQUIRED THAT YOU KNOW ANY OF THIS AND IS PROVIDED FOR YOUR INFORMATION IN CASE YOU'RE INTERESTED IN MODELS LIKE THIS FOR YOUR PROJECT***

For this example I'm going to use some data from the American National Election Study (ANES), which you've seen some data from in your HW.  The ANES survey is conducted in each election year, and the same respondents are surveyed both pre- and post-election.  Not all of the feeling thermometer items are asked in both time periods, but feelings towards the two major presidential candidates are asked both pre- and post-election.  If the respondents' feelings of the candidates are stable (not affected by anything that happened between the interviews or the outcome of the election) we wouldn't expect them to significantly differ.  

In [None]:
anesft <- readRDS("anes2.rds")
glimpse(anesft)
table(anesft$partyid)

In order to use time as a predictor we have to convert our dataframe from wide format (pre and post as two different columns) to long format where we have one variable that's time (pre or post) and one variable that's the feeling rating.  This is called long format because now instead of having one observation per respondent with columns for each time period we now have two observations per respondent, one for pre and the other for post.

In [None]:
anes_long <- anesft %>% 
                ## add person id so we can keep track of the pairs
                mutate(id = row_number()) %>% 
                ## pick only the variables we want to keep
                select(id, partyid, ft_pre_rep, ft_post_rep) %>%
                ## "gather the data from wide to long"
                gather(key = "time", value = "ft_trump", -id, -partyid) %>% 
                ## update the factor levels so they're not the old variable names
                mutate(time = fct_recode(time, "Pre-Election" = "ft_pre_rep", 
                                                       "Post-Election" = "ft_post_rep"))
head(anes_long)
tail(anes_long)

We can ignore the warning message - it's due to there being label attributes on the observations/variables in the anes dataset which we can ignore.

Let's look at our data graphically!  We can do this the same way we did above.

In [None]:
options(repr.plot.width=10, repr.plot.height=7) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this


anes_long %<>% mutate(time = fct_relevel(time, rev)) 

anes_long %>% 
        ggplot(aes(x = time, y = ft_trump, fill = time)) + 
        geom_violin() +
        geom_boxplot(width=0.1, fill = "white", color = "black", size = 1)+
        stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., color = "mean"),
                     width = 0.75, linetype = "solid", size = 2) +
        scale_fill_manual(values = c("#26D5B8", "#FF5733")) +
        scale_color_manual(values = "#39ff14")+
        labs(fill="Time",
             y = "Feeling about Trump",
             x = "",
             title = "Distribution of Feeling toward Trump - pre-election vs. post-election",
             color = "Group Mean") +
        theme(legend.position = "bottom", text = bold.14.text) +
        ylim(0,100)

These distributions are not even approximately normal - there is a large amount of observations around 0 and then a second peak in the upper tail.  It looks like people have strong opinions about trump - either positive or negative and not many people are in the middle.

If we just wanted to look at this one variable (time) we could easily do a paired t-test, however we want to also see the effect of partyid on rating as well.

In [None]:
options(repr.plot.width=10, repr.plot.height=5) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this.

levels(anes_long$partyid) <- c("Democrat", "Republican")

anes_long %>% 
  ggplot(aes(x = time, color = partyid, group = partyid, y = ft_trump)) +
  stat_summary(fun.y = mean, geom = "point", size = 5) +
  stat_summary(fun.y = mean, geom = "line", size = 2) +
  scale_color_manual(values = c("blue", "red")) +
  labs(title = "Potential Interactions between time and political party in feelings toward Trump",
       x = "Time",
       y = "Feeling toward Trump",
       color = "Political Party") +
  theme(text = bold.14.text)

There isn't an interaction - we can see the lines are parallel - even though there is a HUGE difference in pre- and post-election ratings of Trump by partyid, in both parties the average rating is higher post-election (after he won).

### Checking Assumptions
ANOVA analysis has 6 assumptions, 5 of which we can review/check before our analysis.

The basic ones we don't need to test but just need to confirm (possibly through looking at survey/data documentation):
1. Dependent variable is numeric - **yes, feeling rating from 0-100**
2. Group sample sizes are approximately equal - **yes - there are the exact same number of observations pre- vs. post-election because they're paired. The breakdown by partyid is also close (about 1k vs. about 1.2k)**
3. Independence of observations - **NO - we have repeated measures because they're coming from the same respondents at both time periods - we need to account for this in our analysis.**
4. No extreme outliers - **No**

And our pre-check assumption we'll run in R:
5. Homogeneity of variance - the within group variance for each of the groups should be equal. **We will check this with Levene's Test**

In [None]:
LeveneTest(ft_trump~time, data = anes_long)
LeveneTest(ft_trump~partyid, data = anes_long)

The variances are equal within time periods (p = 0.54) but the within partyid variances are not equal (p < 0.001).

Now we can run our analysis.  This is similar to how we've run it before, however we have to add an additional Error term to account for within person variance (the variance between the two observations from the same respondent).

We gave each person unique ids before converting our dataset to long, so we can use that id to add the Error term.

In [None]:
trump_aov <- aov(ft_trump ~ time*partyid + Error(id/(time + partyid)), data = anes_long)
summary(trump_aov)

So we get slightly different output.  First, R partions out the SS due to the individual observations.  Then, we get the familiar ANOVA table with the main effect of partyid and time, plus the interaction effect.  The residuals here are everything that's left over after dealing with the 2 main effects and 1 interaction effect MINUS the SS due to individual respondents. We should focus on the bottom table and do not need to interpret the top three - those are just there because we accounted for the within subjects (paired) data.

There is a significant impact of time and partyid on ratings of Trump, but no interaction effect between time and partyid.

[Return to Top](#top)
<a id = "pqform"></a>

## PQ Format
Finally, we'll look at styling our ANOVA model output/results into PQ format tables for inclusion in reports and papers.

For this we'll make use of the `tidy()` function from the `broom` package to convert the ANOVA summary into a dataframe.  Then we can use kable (which we've used previously) to convert that into an attractive PQ table.

I'll also show an example of creating a PQ table from a dataframe using a new package that just came out - `gt`.
https://gt.rstudio.com/articles/intro-creating-gt-tables.html

We'll use the ANOVA results from Example 1 for this purpose.

In [None]:
# review the standard R output for our ANOVA model
summary(race_educ_aov)

In [None]:
# convert output to a df using tidy
df_aov <- broom::tidy(race_educ_aov)
df_aov

In [None]:
#convert df to pq format

#update the first column to PQ names
df_aov$term <- c("Education", "Race/Ethnicity", "Interaction of Education and Race", "Residuals")

#format columns 3 - 4 (Add commas) 
df_aov$sumsq <- comma(df_aov$sumsq)
df_aov$meansq <- comma(df_aov$meansq)

#format F and p-values, convert to character and replace NA with blank string ""
df_aov$statistic <- formatC(df_aov$statistic, digits = 3, format = "f")
df_aov$p.value <- formatC(df_aov$p.value, digits = 3, format = "f")
df_aov[4,5:6] <- ""

#update colnames to PQ
colnames(df_aov) <- c("Source of Variation", "DF", "Sum of Squares", "Mean Squares", "F-ratio", "p-value")

In [None]:
## use kable to convert to PQ table
tname <- "ANOVA Model: The impact of Education and Race on Income"
titlehead <- c(tname = 6)
names(titlehead) <- tname

df_aov %>% kable(booktabs = T, align = "lcrrcc") %>% 
            kable_styling(full_width = FALSE) %>% 
            add_header_above(header = titlehead, align = "l",
                             extra_css = "border-top: solid; border-bottom: double;") %>%
            row_spec(0, extra_css = "border-bottom: solid;") %>% 
            row_spec(nrow(df_aov), extra_css = "border-bottom: solid;") %>%
            save_kable("pq_anova.png") ## save as image

The resulting table looks like:

![](pq_anova.png)

### Making a PQ table using the new `gt` package

This package was recently released.  The goal is to create a "grammar of tables" similar to how ggplot2 is a "grammar of graphics."  This may be a bit more straightforward to use since we don't have to do any pre-editing of the dataframe, just use all of the helpful functions to adjust the barebones dataframe to full PQ format.

There are so many fun stylistic choices you can make (like with ggplot2) to adjust the style, format, colors, etc.

Full documentation: https://gt.rstudio.com/reference/index.html

In [None]:
# obtain tidy aov output again
df_aov2 <- broom::tidy(race_educ_aov)
df_aov2

In [None]:
#use gt to make PQ table
df_aov2 %>% 
  mutate(term = c("Education", "Race/Ethnicity", "Interaction", "Residuals")) %>% 
  gt(rowname_col = "term") %>% 
  ## add a header (table name)
  tab_header(
    title = md("**ANOVA Results**: The Impact of Education and Race on Income")) %>%  
                # wrapping something in two astrisks makes it bold
  ## format SS and MS to be more readable
  fmt_number(columns = 3:4, suffixing = TRUE) %>%  ##Suffixing scales our SS and MS to Billions (65.65B instead of 65,541,593,327)
  fmt_number(columns = 5:6, decimals = 3) %>% ## round to 3 decimal places
  ## supress NA from table output
  fmt_missing(columns = 5:6, missing_text = "") %>% 
  cols_label(sumsq = "Sum of Squares", ## add linebreak inside colname by using html() formatting
             meansq = "Mean Squares",
             statistic = "F-ratio",
             p.value = "p-value") %>% 
  cols_align(align = "center") %>% 
  ## maybe I'll bold my significant p-values
  tab_style(style = cell_text(weight = "bold"), ## how I want to style the cells
            locations = cells_body(columns = vars(p.value), ## which column the cells are in I want to style
                                   rows = p.value < 0.05)) %>% ## logical indicating which rows to style
  ## add footer indicating what B means
  tab_footnote(footnote = "values rounded to billions of dollars.",
               locations = cells_body(columns = vars(sumsq, meansq))) %>%  
                ## even after relabeling columns we can use the varnames
  gtsave("gt_aov_table.png")

The resulting table looks like:

![](gt_aov_table.png)