# Moderation & Mediation


1. Graphical models
2. Moderation
3. Mediation
4. Moderated mediation

The lecture draws from Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior research methods, 40(3), 879-891.

---

# 1. Graphs

Before we get into the ideas of _moderation_ and _mediation_, we will want to take a step back and rethink how we think of and visualize our statistical models.

So far we have been discussing our statistical methods as having a general form of

$$ Y = f(X) $$

This is a _formulaic_ way of thinking of the relationship between X & Y. But another way to visualize the same system is graphically.

![SimpleGraph](imgs/SimpleGraph.png)

Here we are visualizing the same relationship as described in the equation above, but as a **graphical model** or a **graph**. In this case we have distilled $f()$ into an arrow or line that simply says that there exists a relationship betwen $X$ and $Y$, but makes no specifications on the nature of that relationship.

Formally a graph is a mathematical object that describes a set of relations between variables. All graphs have 2 parts.

* **nodes:** The objects (i.e., variables)
* **edges:** The connections (i.e., relations)

Now the edges themselves come in two flavors:

* **directed:** A causal relationship between two or more variables (i.e., X leads to Y)

![SimpleGraph](imgs/SimpleGraph.png)

* **undirected:** An association between two or more variables where the direction of cauasality is not assumed (i.e., a correlation)

![SimpleGraph](imgs/UndirectedGraph.png)

There is tremendous value in representing a system of equations as a graph. Not the least is that it can easily distill complex relationships between variables in an intuitive way. But beyond that, graphs allow for describing, in mathematical terms, very complex relationships. 

Now there is an entire field of study on graphs that we do not have time to get into in this class. However, we bring up graphs because as we move away from simple associations between variables (e.g., linear regression, classification) to more complex, hierarchical relations, seeing them as graphs will help in understanding these more complex relationships.

Let us consider a few, shall we?



---

# 2. Moderation

We have actually already considered one of these more complex relationships when discussing interactions. Remember from our lectures on linear regression, interaction models are cases where two predictor variables interact to determine a level of an outcome variable.

Normally we wrote this as follows

$$ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 $$

Here the regression coefficients describe the following relationships:

* **$\beta_1$:** Units that $Y$ changes with $X_1$.
* **$\beta_2$:** Units that $Y$ changes with $X_2$.
* **$\beta_3$:** Units that $Y$ changes with $X_1$, depending on the state of $X_2$.


If you are used to thinking about interactions in the context of ANOVA you likely see $X_1$ and $X_2$ as independent manipulators of $Y$, and your goal for the analysis is to just see how they play against each other. 

However, there is another way to view this problem. Lets rename $X_2$ as $W$, which we will call a _moderator_ variable. The goal of **moderation analysis** is to see how $W$ tempers the relationship between your main predictor ($X$) and outcome ($Y$) variables. But the equation is still the same.

$$ Y = \beta_0 + \beta_1 X + \beta_2 W + \beta_3 XW $$

Here $\beta_3$ now describes this moderating effect.

Visually, we would describe this relationship this way.

![SimpleGraph](imgs/ModerationGraph.png)

In this case, we say that the moderating effect of $W$ is only true if $\beta_3 >0$.



**Example:**

Let's ask whether gender moderates the effect of childhood trauma on risk aversion. In this case, the variables of the model are.

* $Y$ ~ a behavioral measure of risk aversion
* $X$ ~ a childhood trauma index score
* $W$ ~ gender (male = 0, female = 1)

If we fit the model

$$ Y_{risk} = \hat{\beta_0} + \hat{\beta_1} X_{trauma} + \hat{\beta_2} W_{gender} + \hat{\beta_3} X_{trauma}W_{gender} $$

and the term $\hat{\beta_3}$ is significantly different than zero (e.g., via a parametric null hypothesis test), then we can confidently say that gender moderates the impact of childhood trauma on risk aversion. For example, if $\hat{\beta_3} << 0$, then that would indicate that being female reduces the effect of trauma on risk aversion.




**Caution:**

A few points of caution before we move on to mediation. 

1. It is important to keep in mind that moderation is not indicating a causal role of $W$ in the relationship between $X$ & $Y$. It is simply saying that, from a statistical perspective, including $W$ in your model significantly alters the relationship between $X$ and $Y$. 

2. As a result of point #1, you cannot determine whether the moderator variable is directly interacting with $X$ & $Y$ or reflects the impact of another variable that correlates both with $W$ and with $X$ (i.e., "third variable problem").



---

# 3. Mediation

Now let's move on to a slightly more complex relationship. What if you want to see whether another variable (or variables) determine the relationship between $X$ & $Y$? In otherwords, is the presence of variable $M$ a _necessary_ condition for the state of the $Y=f(X)$ relationship? If so, then we would consider $M$ to be a _mediator_ variable.

Graphically we describe the mediating effects of $m$ mediator variables on the relationship between a predictor variable $X$ and response variable $Y$ like this

<img src="imgs/PathModels_model1.png" width=400>

Notice we are changing the naming relationships a little bit. Let us take a closer look at the edges in this graph.

* **$a_i$:** The influence of predictor $X$ on the $i^{th}$ mediator variable $M_i$.
* **$b_i$:** The influence of the $i^{th}$ mediator variable $M_i$ on the response variable $Y$.
* **$a_ib_i$:** The indirect pathway mediating the influence of the predictor variable $X$ on teh response variable $Y$, mediated by the $i^{th}$ mediator variable $M_i$.
* **$c'$:** The direct pathway between the predictor $X$ and response $Y$ variables **after** accounting for all _m_ mediating variables $M_{i ... m}$.

The reason we call the direct pathway $c'$ and not $c$ is that it is looking at the relationship beween $X$ & $Y$ once all the mediating effects are accounted for. If you don't account for the mediators, then you're looking at the more traditional regression problem. In mediation world speak we call this the _total pathway_.

* **$c$:** The total pathway between the predictor $X$ and response $Y$ variables, without any intermediating effects accounted for.





**Estimating mediation effects**

While there are several proposed approaches to quantifying mediating relationships, we will focus here on one of the most popular versions, originally proposed by Preacher & Hayes (2008). Preacher & Hayes framed the mediation model as a series of three simultaneous regression estimation problems.

 1. $Y=\hat{c}X$
 2. $M=\hat{a}X$
 3. $Y=\hat{b}M + \hat{c'}X$
 
Notice how in here you are fitting all 4 major pathways that I described above: $a,b,c, c'$.
 
Calculating the indirect pathway effect (i.e., $ab$) is then as simple as multiplying $\hat{a}$ by $\hat{b}$. That's it. Now it becomes easy to see why they are called the _indirect_ and _direct_ pathways.

$$ Y = bM + c'X \\
= b(aX) + c'X \\
= abX + c'X \\
= (indirect)X + (direct)X$$

If $ab \neq 0$ then you can say that M has a significant mediating effect on the relationship between $X$ and $Y$. Put another way, we say that $X$ has an indirect influence on $Y$ via $M$. 

How do you know if this is a _statistically significant_ effect? Well we can estimate confidence intervals on $ab$ using bootstrapping. So we re-estimate the mediation model by sampling with replacement. After _n_ iterations we look at the distribution of $ab$ values that we obtained from the bootstraps and then calculate the 95% confidence intervals. If 0 does not fall within the confidence interval, then we reject the null and say that the mediating effect is significant.







**Example:** 

Simple for purposes of illustration, we can re-examine the toy model we presented above, but instead treat gender as a mediator, instead of a moderator.

* $Y$ ~ a behavioral measure of risk aversion
* $X$ ~ a childhood trauma index score
* $M$ ~ gender (male = 0, female = 1)

If we fit the model according to the Preacher & Hayes method, and the bootstrapped confidence intervals on $ab$ do not include 0, then we can now interpret gender as _mediating_ the effect of childhood trauma on risk aversion.

Now colloquially the word "mediation" has a causal implication. In this example, it can be easy to think that this means that gender _determines_ whether trauma induces risk aversion. But that's not actually the case here. What the mediation model is saying is that gender is explaining a significant "portion of the variance" in the relationship between trauma and risk aversion. The difference is subtle but important. With these sorts of models, you can have _partial mediating_ effects. This means that the presence of the mediator is necessary for a part of the relationship, but not the whole relationship.

**Assumptions**

While the use of bootstrapping does make the evaluation of the the mediating effect non-parametric (and thus relying on minimal assumptions), mediation tests like those proposed by Preacher and Hayes do have some assumptions to be aware of.

 1. Both $a$ and $b$ pathways have to be $\neq 0$ (from a null hypothesis test with a predetermined $\alpha$) for an $ab$ pathway to be inferred as being significant. So even if the bootstrap reveals that the confidence interval for $ab$ does not include zero, if the confidence intervals on either $a$ or $b$ alone do include zero (i.e., not statistically significant), then you cannot reject the null on the indirect pathway effect.

 2. You have sufficient statistical power to run the mediation test. Having underpowered models (e.g., too small of a sample size) can inflate the false positive effects because the models themselve are more complicated than traditional regression models.

**Not Assumptions**

It can be easy to think that another assumption of mediation models is that the original relationship (i.e., total pathway, $c$) between $X$ and $Y$ must also be signficant for an indirect pathway effect to be significant. However, this is not the case. Sometimes, the indirect pathway can mask or hide a true total pathway effect.

This can make sense if you look closer at the total pathway model.

$$ Y = cX $$

By definition, the total pathway is composed of both the direct and indirect pathways. Thus, you can rewrite above as.

$$  Y = (ab + c')X $$

It could be the case that $c=0$ because $ab=c'=0$. However, it could also be the case that $c=0$ even if both $ab \neq 0$ and $c' \neq 0$. How? Well in the case where $ab-c'=0$ (or, put another way, $\frac{c'}{ab}=-1$). In this case the competition between the direct and indirect pathways causes the total pathway effects to wash out.

**Multiple Mediators**

Mediation models extend fairly easily to any number of mediation variables (so long as the model is sufficiently powered). In fact, the first graphical illustration I used above shows this.

<img src="imgs/PathModels_model1.png" width=400>

Here all _m_ mediators are independent of each other, so they can be included in the same model fit.

* $M_i = \hat{a}_i X$
* $Y = \Sigma_{i=1}^m \hat{b}_i M_i + \hat{c}'X $



**Example:**

Let's keep going with the example problem we have been using on childhood trauma and risk aversion.

* $Y$ ~ a behavioral measure of risk aversion
* $X$ ~ a childhood trauma index score
* $M_1$ ~ parental socioeconomic status (SES)
* $M_2$ ~ psychiatric risk score
* $M_3$ ~ social network size

Here we can fit the following mediation model and use bootstrapping to determine the significance of each indirect pathway.

* $ M_1 = \hat{a}_1 X $
* $ M_2 = \hat{a}_2 X $
* $ M_3 = \hat{a}_3 X $
* $ Y = \hat{c}'X + \Sigma_{i=1}^m \hat{b}_i M_i$

In this case, if $\hat{a}_1\hat{b}_1 < 0$, $\hat{a}_2\hat{b}_2 > 0$, and $\hat{a}_3\hat{b}_3 = 0$ (and all individual $\hat{a}_i$ and $\hat{b}_i$ pathways are also significant), then we would say that:

* parental SES has a negative mediating effect
* psychiatric risk has a positive mediating effect
* social network size has no mediating effect





---
# 4. Moderated mediation

So far I have presented moderation and mediation as two separate analyses, but they are not necessarily mututally exclusive. In fact, you can put them both together to describe even more complex relationships. These models are, of course, called _moderated mediation_ models and can be visualized like this.

<img src="imgs/PathModels_model2.png" width=400>

As before $M_i$ is a mediator variable while $W$ is a moderator _of the mediating effect_. What this means is that the degree of the mediating effect that $M_i$ has on the $X$ to $Y$ relationship can itself be moderated by the factor $W$.

$$M_i = a_iX + e_iW + f_iXW$$

Notice here, just like in an interaction model, we are applying the hierarchical principle (i.e., including the main effect terms in the estimation of the interaction.

We interpret the pathway $f$ as the moderating mediator relationship, while $e$ describes the direct relationship $W$ has on the mediating variable itself. 

Put together, this makes the prediction of $Y$ dependent on a series of moderating and mediating factors. In the example above, $Y$ is now described as

$$ Y = \Sigma_{i=1}^m \hat{b}_i M_i + \hat{c}'X \\
= \Sigma_{i=1}^m \hat{b}_i (a_iX + e_iW + f_iXW) + \hat{c}'X $$

Of course, in the example of moderated mediation that I have used so far, the moderating effect happens on the $a$ pathway (i.e., on the relationship between the predictor $X$ and each mediator $M_i$). The moderation effects can also happen on the $b$ pathway (i.e., on the relationship between each mediator $M_i$ and the outcome variable $Y$). In addition, the moderating variables can have their _own_ moderating, indirect pathways. These are known as step-wise models.

An example of a step-wise moderated mediation model with moderators on both the $a$ and $b$ pathways would look like this.

<img src="imgs/PathModels_model4.png" width=600>

You can see how the complexity of these modeled relationships can expload relatively quickly. However, with this increased complexity comes greater power at describing more nuanced and hierarchical relationships between your variables.