# Multiple Explanatory Variables

A primary application of finding multiple variables is to establish causation.
To show causation, we need the following:

1. Statistical test showing a _significant_ association.
2. Experimental design establishing time order.
3. Control to rule out **confounding variables**.
    This can be accomplished through experimental design or statistical testing.
    
    - **Random assignment** is an experimental design approach to control for confounding variables.
    - **Dependent sampling** is another experimental design approach
    - **Chi-square** test is a statistical approach that directly checks for confounding variables.

We can identify several types of association between explanatory variables and outcome variables, in cases where the explanatory variable is *not* the sole cause for the outcome.  
Below, consider outcome $y$ associated with explanatory variable $x_1$, and caused by explanatory variable $x_2$:

1. **Spurious association** No causation between the explanatory and outcome variables, pure correlation  
    ($x_1 \not\Rightarrow y \land x_2 \implies y$)  
    It is possible the two share a common cause ($x_2 \implies x_1 \land x_2 \implies y$).
2. **Chain Relationships** Mediators, which affect the causation relation between the outcome and another explanatory variable.  
    ($x_1 \implies x_2 \implies y$)  
    Above is _complete_ mediation; partial mediation is possible, where $x_1$ has some direct association:  
    ($x_1 \implies x_2 \implies y \land x_1 \implies y$)  
3. **Suppressor Variable** The explanatory variable must be included in order for another explanatory variable to have a statistical association with the outcome.  
    ($x_2 \implies y \land x_1 \implies \neg y$), or maybe ($x_2 \implies y \land x_1 \implies \neg x_2$), not quite sure  
    Basically, $X_2$ causes $y$, but this won't be visible statistically until we include the effects of $X_1$.
4. **Multiple Causes** The explanatory variable is one of several with a causation relationship to the outcome.  
    ($x_1 \implies y \land x_2 \implies y$)  
    It is possible that the explanatory variables have an association as well.  
    ($x_1 \implies y \land x_2 \implies x_1 \land x_2 \implies y$)
5. **Statistical Interaction** Moderators, which cause different groups to have different associations between the explanatory and outcome variables.  
    ($x_1 \implies y \mid x_2$)

## Examples

Statistical Interaction:  
Consider outcome variable of mood, and explanatory variables of ADHD score and meditation time.
We might expect mood to improve with meditation time.
However, this may be moderated by ADHD score, where people with low scores have a positive association between meditation and mood, while people with high scores may have a negative association.

## Controlling for Third Variable

In cases above, where we believe there is some variable $x_2$ that causes $y$, but we want to study effects of $x_1$, there's two ways to control for $x_2$:
1. Experimental Design.  
    Discussed above
2. Statistical Control.  
    For regression, we'd add $x_2$ to our regression model, and check for changes in slopes/explained variance.