# Multiple Regression and Interaction Terms

In this lab, we're going to cover more regression modeling, including modeling interaction terms and plotting marginal effects.

To begin, install the **{modelsummary}**, **{faux}**, and **{margins}** packages (if needed) and load them along with **{tidyverse}**.



In [None]:
install.packages(c("modelsummary", "faux", "margins"))

In [None]:
library(tidyverse)
library(modelsummary)
library(faux)
library(margins)

## The data

Let's again start with some simulated data so we can match regression output to "real" answers. 

Start with a random variable *x*. 


In [4]:
xvar <- runif(n = 250, min = 3, max = 15)

Now create an indicator variable, created at random and rounded to the nearest whole number. 

In [5]:
dum <- runif(250, min = 0, max =1) # by drawing within the [0,1] interval, we ensure a 0,1 dummy var.
dum <- round(dum, 0) # the 0 is the number of decimal places

Now let's create some correlated data predictors of *y*. We'll use the *rnorm_multi()* function from the **{faux}** package.

In [6]:
newvars <- rnorm_multi(n = 250,
                        mu = c(7, 51),  # the means of the two vars
                        sd = c(3, 20),  # the standard deviations
                        r = .35,        # the correlation between the vars
                        varnames = c("xvar2", 
                                     "xvar3"))

Combine our data together into a data frame.

In [7]:
df <- tibble(xvar, dum, newvars)

Create your outcome variable. In the code below, I specify an interaction term, $xvar2 \times xvar3$.

In [8]:
df <- df |> mutate(
    yvar = -4 + 0.8*xvar - 3.64*dum  + 1.06*xvar2 + .35*xvar3 - .012*(xvar2*xvar3) + rnorm(250, 0, 4))


## Modeling

Let's run three models, with each one adding variables until we get to our full, correct specification. We can store these model results and create our table using **{modelsummary}**.

In [None]:
m1 <- lm(yvar ~ xvar, data = df)  
m2 <- lm(yvar ~ xvar + dum + xvar2 + xvar3, data = df)
m3 <- lm(yvar ~ xvar + dum + xvar2*xvar3, data = df)

modelsummary(models = list(m1, m2, m3), 
            estimate = "{estimate}{stars}", # this arg adds stars for sig
            output = "jupyter") # again, don't use this arg 

But interpreting this output is tricky, especially the interaction term and the constituent terms. Let's do some plotting!

## Marginal Effects

Marginal effects are estimates of the change in $\hat{y}$ associated with a some small change in a predictor variable. We typically calculate marginal effects when using interaction terms and then graphing them to see how the marginal effect of one variable changes due to the values of a variable interacted with it. We'll use the **{margins}** package. 

In [None]:
margins(m3, variables = "xvar2", at= list(xvar3 = c(0, 25, 90))) 

We can also directly plot the marginal effects.

In [None]:
# dx states the conditional marg effect you want to graph, conditional on the var in x = 
cplot(m3, x = "xvar3", dx = "xvar2", what = "effect", data = df) 

That's fine, but let's save the output and graph in ggplot to make it more visually appealing. 

In [None]:
# save results in "out"
out <- cplot(m3, "xvar3", dx = "xvar2", what = "effect", data = df, draw = FALSE)

# use out data with ggplot

p <- ggplot(data = out, mapping = aes(x = xvals))
p + geom_line(mapping = aes(y = yvals),
              color = "black",
              size = 2 ) +
    geom_ribbon(aes(ymin = lower, ymax = upper),
              color = "gray40",
              alpha = .2) +
    theme_light()
