# Elements of a Model
Now that we have more of a sense of what a model is, let us talk more specifically about the elements that make up a statistical model. The aim here is to remain very general so that the same framework can be used for a wide variety of analyses. This disadavantage of doing this is that it will be a bit abstract to begin with. However, just try and absorb this way of thinking as best you can and hopefully it will make more sense when we start to see applications in the coming weeks.

## Variable Terminology
Before going any further, we need to define some terminology. So far, we have only focussed on a single variable that is measured as part of an experiment. However, an experiment is more than just an exercise in *measurement*. Rather, an experiment is an exercise in assessing how that measurement *changes* after the manipulation of other variables[^manip-foot]. In order to discuss statistical models, we therefore need to discuss these other variables and how they can be integrated into the framework we have already established. 

Our variable of primary interest is known as the *outcome variable*, whereas any other variables that we believe to be related to the outcome are known as *predictor* variables. For example, within the `mtcars` dataset we can consider `mpg` as our outcome variable with `wt` (*weight*) and `hp` (*horsepower*) as our predictor variables. In notation, we refer to our outcome as $y$ and our predictors as $x$. Our assumption is then that the value of our *outcome* is some function of our $k$ *predictors*, plus error

$$
y = f(\mathbf{x}) + \epsilon,
$$

where $\mathbf{x}$ refers to all the predictor variables 

$$
\mathbf{x} = \left\{x_{1}, x_{2}, x_{3}, \dots, x_{k}\right\}.
$$ 

This is just a formal way of saying that we assume that there is some *connection* between our outcome and our predictors. The actual form of this connection is not known, but we assume it exists (otherwise, why did we conduct the experiment?) In the `mtcars` example, we could write

$$
\text{MPG} = f(\text{Weight}, \text{Horsepower}) + \epsilon,
$$

to indicate our fundamental assumption that the MPG of a car is connected to both its Weight and its Horsepower. We do not know what this connection is yet, and we do not assume that we will be able to predict MPG *perfectly* from just Weight and Horsepower (hence the errors), but we do believe that there is some useful way of predicting MPG from these variables. 

`````{admonition} Variable Names
:class: tip
Within this lesson, we have used the terms *outcome* and *predictor* variables. However, you may already know these by different names. Within psychology, the outcome variable is more commonly known as the *dependent* variable (or DV), with the predictor variables more commonly known as the *independent* variables (or IVs). We have avoided these terms for two main reasons. Firstly, IV and DV are not as commonly used within the statistical literature and so it is helpful to get used to slightly different language. Secondly, the terms *dependent* and *independent* tend to be used in relation to whether variables are *correlated* and thus it is confusing to refer to independent dependent variables, or dependent independent variables. We find the terms *outcome* and *predictor* much more descriptive of the experimental situation. We measure the outcome variable as the *outcome of our experiment* and we assume that the predictor variables can be used to *predict the value of the outcome*. The table below gives some alternative terms for the same concepts:

| Concept            | Common Alternatives                                                           |
|--------------------|-------------------------------------------------------------------------------|
| Outcome Variable   | Dependent Variable (DV), Response Variable, Target Variable                   |
| Predictor Variables| Independent Variables (IVs), Input Variables, Features, Explanatory Variables |

`````


## Model Components
To help us understand parametric statistical models, we can break them down into three components. This is useful, because whenever we start trying to understand a new model we simply need to understand these components to conceptualise what the model is doing. These components are summarised in the table below.

| Component               | Description |
|-------------------------|-------------|
| Population Distribution | The assumed form that our random variable of interest takes within the population. This can either be a continuous or discrete distribution, depending upon the variable in question |
| Mean Function           | An equation that connects the values of the predictor variables to the *expected value* of the population distribution. This can be seen as *converting* the values of the predictor variables into a value for the outcome variable |
| Variance Function       | Much like the mean function, this is an equation that connects the values of the predictors to the *variance* of the population distribution. This captures how the width of a distribution may change, depending upon the value of the predictors. |

So, for every new parametric statistical model we encounter we need to ask what the assumed population distribution is, what the mean function is and what the variance function is. This is applicable to a large variety of models and transends both Frequentist and Bayesian approaches. This also connects very directly with the discussions of the *expected value* and *variance* of a distribution from the previous section. We will now work through each of these elements to understand their purpose and how they work. We will see many more examples of these throughout the course, so do not worry if these concepts do not click straight away.

## Component 1: The Population Distribution
The first element of a parametric statistical model is the assumed distribution for the population. As we will come to see, many statistical methods are designed around assuming that our outcome is a *continuous random variable* drawn from a *normal distribution*. We will stick to this assumption for the remainder of this unit, but just note that this is not the *only* distribution we could assume for our population. Statistical theory is much more general than just the normal distribution. Nevertheless, we will assume that

$$
y_{i} \sim \mathcal{N}\left(\mu,\sigma^{2}\right).
$$

In order to integrate other variables into this framework, we can allow the distribution of $y$ to *change* depending upon the values of the predictors. Importantly, we assume that the *form* of the distribution stays the same (i.e. all the data remains normally distributed), but that the *parameters* of the distribution shift depending upon the values of other variables. The result of this is that both the *expected value* and the *variance* of the outcome can change, depending upon other measurements made during the experiment. For instance, we could allow the average value of `mpg` to change depending upon the value of `wt`. As another example, we could allow the mean of the distribution to change depending upon different experimental groupings (e.g patients vs controls). In this latter case, we may also wish to allow the variance to change across the experimental groupings, as perhaps the controls are more consistent in their responses than the patients and thus have a smaller population variance.

In order to express this dependency, we use the notation $y_{i}|\mathbf{x}_{i}$, which is read as "the value of $y_{i}$ *given* the values in $\mathbf{x}_{i}$". This is known as a *conditional* statement[^conditional-foot] and reflects an assumed dependence between $y$ and $\mathbf{x}$. The values on the *right* of the vertical bar indicate some additional information that needs to be taken into account. To see how this works in terms of our model, we can specify

$$
y_{i}|\mathbf{x}_{i} \sim \mathcal{N}\left(\mu_{i},\sigma^{2}_{i}\right).
$$

So, this is saying that the distribution of $y_{i}$, given some values of the predictor variables, is a normal distribution with a mean given by $\mu_{i}$ and a variance given by $\sigma^{2}_{i}$. Because these both have a subscript, we allow both the mean and variance to change for each value of $y$. Although somewhat abstract at present, we will see how this allows us to accommodate predictor variables into the model using the concept of a *mean function* and a *variance function*. 

## Component 2: The Mean Function
In order to connect the mean of the assumed population distribution to the predictor variables, we define a *mean function*. The purpose of the mean function is to *convert* the $i$th values of the predictor variables into the expected value of $y_{i}$. How this conversion is achieved is one of the fundamental differences between different models. Formally, we can write the mean functions as $E(y_{i}|\mathbf{x}_{i})$, which is just the expected value of $y_{i}$, but after taking the values of the predictor variables into account. For instance, the expected value of `mpg` after we have taken the values of `wt` and `hp` into account. 

The simplest form of the mean function would be when we have *no* predictor variables. If we assume a fixed value for the variance (just to keep things simple), the probability model is then

$$
y_{i}|\mathbf{x}_{i} \sim \mathcal{N}\left(\mu_{i},\sigma^{2}\right).
$$

The expected value of a normal random variate is the mean of the distribution. As such, the expected value of $y_{i}|\mathbf{x}_{i}$ would be $\mu_{i}$. The key point of the mean function is how we determine what this value will be. In the case of no predictor variables, a sensible value would just be the overall mean of $y$. This gives a mean function of

$$
E\left(y_{i}|\mathbf{x}_{i}\right) = \mu_{i} = \mu.
$$

If we integrate the mean function into the probability model and drop the conditional notation (just to simplify things), we have a model with the form 

$$
y_{i} \sim \mathcal{N}\left(\mu,\sigma^{2}\right),
$$

which we can also express (given the discussion in the previous section) as

$$
\begin{align*}
    y_{i}         &= \mu + \epsilon_{i} \\
    \epsilon_{i}  &\sim  \mathcal{N}\left(0,\sigma^{2}\right).
\end{align*}
$$

So, irrespective of the value of $i$, this model assumes that the data are drawn from a normal distribution with a fixed mean and fixed variance. This is illustrated below within an interactive 3D plot using the `mtcars` data, with seven example normal distributions illustrated.

In [1]:
library(rgl)

# Data
data('mtcars')
n     <- length(mtcars$mpg)
mu    <- mean(mtcars$mpg)
sigma <- sd(mtcars$mpg)

# Open 3D window
open3d()

# Set the desired ranges
xlim <- c(1,n)
ylim <- c(0,50)
zlim <- c(0,.2)

# Plot invisible points to define the bounds
plot3d(NA, xlim=xlim, ylim=ylim, zlim=zlim,
       type="n", axes=FALSE, xlab="", ylab="", zlab="")

# Plot the 3D scatter plot
points3d(seq(1,n), mtcars$mpg, rep(0,n), col="red", size=5)

# Overlay the 1D normal distribution curves
n.norms <- seq(1, n, length.out=7)

for (i in 1:length(n.norms)){
  # Curve data
  x.val   <- n.norms[i]
  y_curve <- seq(ylim[1], ylim[2], length.out=200)
  x_curve <- rep(x.val, length(y_curve))
  z_curve <- dnorm(y_curve, mean=mu, sd=sigma)
  
  # Normal curve
  lines3d(x_curve, y_curve, z_curve, col="blue", lwd=3)
  
  # Mean line
  lines3d(c(x.val,x.val), c(mu,mu), c(0,max(z_curve)), lwd=2, col="green3")
}

# Grand mean line
lines3d(c(0,n),c(mu,mu),c(0,0), lwd=2)

# Add axes and grid
axes3d(edges=c("x--","y--"), col="black")
grid3d(c("z"), col="gray")

# Add axis labels
mtext3d("Car", edge = "x--", line = 5)
mtext3d("MPG", edge = "y--", line = 5)

#==================================================#
# Adjust the viewport (only needed for the lesson) #
#==================================================#
mat       <- par3d("userMatrix")
nmat      <- mat
nmat[2,4] <- mat[2,4] + 15  # Move 10 along y

# Apply the new matrix to shift the camera
par3d(userMatrix=nmat)

# Zoom in
par3d(zoom=0.55)

glX 
  1 

"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"
"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"
"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"


In [2]:
# Generate a HTML widget for embedding in the lesson
rglwidget(width=772)

As a more complicated example, imagine we have a single predictor variable (i.e. $\mathbf{x} = \left\{x_{1}\right\}$) and we assume a straight-line relatonship between our outcome and predictor. This gives a *simple linear regression* model, defined by the mean-function

$$
E(y_{i}|x_{i}) = \mu_{i} = \beta_{0} + \beta_{1}x_{i1}.
$$

We will discuss this in more detail next week, so do not worry about the specifics here. The main point is just the *concept* of a mean function. For simple linear regression, the full model is therefore (again, assuming a fixed variance):

$$
\begin{align*}
    y_{i}|x_{i1} &\sim \mathcal{N}\left(\mu_{i},\sigma^{2}\right) \\
    E\left(y_{i}|x_{i1}\right) &= \mu_{i} = \beta_{0} + \beta_{1}x_{i1}
\end{align*}
$$

which we can simplify to

$$
y_{i} \sim \mathcal{N}\left(\beta_{0} + \beta_{1}x_{i1},\sigma^{2}\right),
$$

or

$$
\begin{align*}
    y_{i}        &=    \beta_{0} + \beta_{1}x_{i1} + \epsilon_{i} \\
    \epsilon_{i} &\sim \mathcal{N}\left(0,\sigma^{2}\right)
\end{align*}
$$

So, this time, the mean of the distribution for the $i$th value of $y$ depends upon the values of the predictor variables via the regression equation. This is illustrated below using the `mtcars` data, this time showing how the means of the distributions shift using the value of the predictor variable `wt`.

In [3]:
library(rgl)

# Data
data('mtcars')
n      <- length(mtcars$mpg)
wt.mod <- lm(mpg ~ wt, data=mtcars)

# Open 3D window
open3d()

# Set the desired ranges
xlim <- c(min(mtcars$wt),max(mtcars$wt))
ylim <- c(0,50)
zlim <- c(0,.5)

# Plot invisible points to define the bounds
plot3d(NA, xlim=xlim, ylim=ylim, zlim=zlim,
       type="n", axes=FALSE, xlab="", ylab="", zlab="")

# Plot the 3D scatter plot
points3d(mtcars$wt, mtcars$mpg, rep(0,n), col="red", size=5)

# Regression line
pred <- predict(wt.mod,newdata=data.frame("wt"=c(min(mtcars$wt),max(mtcars$wt))))
lines3d(xlim,c(pred[1],pred[2]),c(0,0), lwd=2)

# Overlay the 1D normal distribution curves
n.norms <- seq(xlim[1],xlim[2],length.out=7)
for (i in seq(1,length(n.norms))){
   # Model data
   x.val   <- n.norms[i]
   beta    <- coef(wt.mod)
   mu      <- beta[1] + beta[2]*x.val
   sigma   <- summary(wt.mod)$sigma
   
   # Curve data
   y_curve <- seq(ylim[1], ylim[2], length.out=200)
   x_curve <- rep(x.val, length(y_curve))
   z_curve <- dnorm(y_curve, mean=mu, sd=sigma)
   
   # Normal curve
   lines3d(x_curve, y_curve, z_curve, col="blue", lwd=3)
   
   # Mean line
   lines3d(c(x.val,x.val), c(mu,mu), c(0,max(z_curve)), lwd=2, col="green3")
}
 
# Add axes and grid
axes3d(edges=c("x--","y--"), col="black")
grid3d(c("z"), col="gray")

# Add axis labels
mtext3d("Weight", edge = "x--", line = 3)
mtext3d("MPG", edge = "y--", line = 3)
 
# Adjust the viewport
mat       <- par3d("userMatrix")
nmat      <- mat
nmat[2,4] <- mat[2,4] + 15  # Move 10 along y
 
# Apply the new matrix to shift the camera
par3d(userMatrix=nmat)
 
# Zoom in
par3d(zoom=0.45)

glX 
  3 

"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"
"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"
"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"


In [4]:
rglwidget(width=772)

Very generally, the mean function can be thought of as defining how to *convert* between the values of the predictor variables and the values of the outcome. In the example above, the regression equation defined how to convert values of `wt` into values of `mpg`. This conversion is achieved using a simple straight-line, but other forms of relationship are also possible, depending upon the data in question. Indeed, different mean functions are one of the main elements that distinguishes between different statistical models. Importantly, there is nothing within this framework to say that the conversion has to be a *good one*, only that some function is needed to convert the units of the predictors into the units of the outcome. The art of model building is partly about choosing the best mean function for the data in question. For instance, We could easily choose a mean function that simply multiplies the value of `wt` by 10 to get it on a similar scale to `mpg`: 

$$
\begin{align*}
    y_{i}|x_{i} &\sim \mathcal{N}\left(\mu_{i},\sigma^{2}\right) \\
    E\left(y_{i}|x_{i}\right) &= \mu_{i} = x_{i} \times 10
\end{align*}
$$

This is a legitimate mean function, but it is unlikely to be a very good one in terms of accurately representing the data we have collected. Decisions around whether the chosen mean function works for the data is a question of *model fit*, which is a key topic we will be covering later on this unit.

```{admonition} The Mean Function as the *Predictable* Element
:class: tip
Thinking back to our discussion in the previous section, because the mean function defines the *expected value* of the population distributions, it can be thought of as defining the *predictable* element of the random variable $y$. When building a model, a core aim is to make sure the mean function appears to fit the data. If it does, then the mean function is assumed to provide a reliable simplification of the connection between our predictor variables and our outcome variable. As such, it must be capturing something of the data-generating process and thus will allow us to say something about the predictable patterns within the data we have collected. For instance, if the simple regression mean function appears to fit well, then we can simplify the relationships within our data in terms of a straight-line. We can then use the properties of this line (i.e. the *intercept* and *slope*) to reach conclusions about the magnitude and direction of this relationship, as well as make predictions about future values. 
```

```{admonition} Advanced: How do mean functions work with other distributions?
:class: warning, dropdown
To help understand the purpose of a mean function, it can be helpful to see how this works generically across different population distributions. As an example, consider assuming a binomial distribution of the form

$$
y_{i} | x_{i1} \sim \mathcal{B}\left(n,p_{i}\right).
$$

Here, the number of trials $n$ is fixed, but we allow the probability of success $p$ to differ across values of $y$. The expected value of a binomial random variate is given by $E(y) = n \times p$. As such, for the model above, the mean function would have the form

$$
E\left(y_{i}|x_{i1}\right) = n \times p_{i}.
$$

In order to take the predictor variables into account, we need some way of converting their values into a probability of success for the $i$th value of $y$. As such, we need a mean function of the form

$$
E\left(y_{i}|x_{i1}\right) = n \times p_{i}\left(x_{i1}\right),
$$

where $p_{i}\left(x_{i1}\right)$ is some function that can convert our single predictor into a probability. A common method is to use a *logistic* function, so the mean function has the form

$$
E\left(y_{i}|x_{i1}\right) = n \times p_{i} = n \times \frac{1}{1 + e^{-\left(\beta_{0} + \beta_{1}x_{i1}\right)}}.
$$

The specific details of this do not matter for the moment. The main point is that the mean function allows us to define a method for converting our predictor variables into the expected value of the assumed distribution. This works for *any* distribution, whether the mean is directly encoded in its parameters or not. 
```

## Component 3: The Variance Function
Much like the mean function, in order to connect the variance of the assumed population distribution to the predictor variables, we define a *variance function*. The purpose of the variance function is to indicate how the uncertainty around the mean of the distribution changes in relation to the values of the predictor variables. An obvious example would be if our predictor variable related to two experimental groups and we want to assume both a different *mean* and different *variance* across the groups. Another example would be if we wanted to capture the variance changing as a function of some continuous predictor variable. Like the mean function, we can express the variance function formally as $\text{Var}\left(y_{i}|\mathbf{x}_{i}\right)$, which is read as the variance of $y_{i}$ *after* taking the values of the predictor variables into account. Using the `mtcars` example, this would be the variance of `mpg` after taking the values of `wt` and `hp` into account.

Similar to the mean function, the simplest situation is when we just assume a constant variance for all values of $y$. This gives a variance function of

$$
\text{Var}\left(y_{i}|\mathbf{x}_{i}\right) = \sigma^{2},
$$

meaning that, irrespective of the values of the predictor variables, the variance of $y$ is assumed to be the same for every observation. If we also assumed a constant mean then the full probability model for a single predictor would be

$$
\begin{array}{rlr}
  y_{i}|x_{i1}                        &\sim \mathcal{N}\left(\mu_{i},\sigma^{2}_{i}\right)  & \text{(Population distribution)} \\
  E\left(y_{i}|x_{i1}\right)          &= \mu_{i} = \mu                                      & \text{(Mean function)} \\
  \text{Var}\left(y_{i}|x_{i1}\right) &= \sigma^{2}_{i} = \sigma^{2}                        & \text{(Variance function)}
\end{array}
$$

which we can simplify to

$$
y_{i} \sim \mathcal{N}\left(\mu,\sigma^{2}\right),
$$

or

$$
\begin{align*}
    y_{i}        &=    \mu + \epsilon_{i} \\
    \epsilon_{i} &\sim \mathcal{N}\left(0,\sigma^{2}\right).
\end{align*}
$$

A single fixed variance is what we implicitly assumed for all the models in the previous section, and is what a lot of applied statistical models also assume, for reasons we will discuss later on the course. 

We could also assume a more complex variance function. For instance, in the example of `mpg`, we could specify a variance that is scaled by the value of `wt`

$$
\text{Var}\left(y_{i}|x_{i1}\right) = \sigma^{2}_{i} = \frac{\sigma^{2}}{x_{i1}}.
$$

This would cause the width of the population distribution to *decrease* as weight increased. For instance, perhaps at higher values the weight of the car becomes the primary factor that determines MPG. As such, any other factors that cause greater variability across cars at lower weights start to become less influential the heavier the car becomes. This is not necessarily what we see within this dataset (and this is not necessarily a very good variance function), but hopefully the principle is clear. This particular case is illustrated in the interactive 3D plot below. Move the viewpoint around to see how the width of the distributions *decreases* as weight increases.

In [5]:
library(rgl)

# Data
data('mtcars')
n      <- length(mtcars$mpg)
wt.mod <- lm(mpg ~ wt, data=mtcars)

# Open 3D window
open3d()

# Set the desired ranges
xlim <- c(min(mtcars$wt),max(mtcars$wt))
ylim <- c(0,50)
zlim <- c(0,.5)

# Plot invisible points to define the bounds
plot3d(NA, xlim=xlim, ylim=ylim, zlim=zlim,
       type="n", axes=FALSE, xlab="", ylab="", zlab="")

# Plot the 3D scatter plot
points3d(mtcars$wt, mtcars$mpg, rep(0,n), col="red", size=5)

# Regression line
pred <- predict(wt.mod,newdata=data.frame("wt"=c(min(mtcars$wt),max(mtcars$wt))))
lines3d(xlim,c(pred[1],pred[2]),c(0,0), lwd=2)

# Overlay the 1D normal distribution curves
n.norms <- seq(xlim[1],xlim[2],length.out=7)
for (i in seq(1,length(n.norms))){
   # Model data
   x.val   <- n.norms[i]
   beta    <- coef(wt.mod)
   mu      <- beta[1] + beta[2]*x.val
   sigma   <- summary(wt.mod)$sigma * (1/x.val)
   
   # Curve data
   y_curve <- seq(ylim[1], ylim[2], length.out=200)
   x_curve <- rep(x.val, length(y_curve))
   z_curve <- dnorm(y_curve, mean=mu, sd=sigma)
   z_curve <- z_curve / max(z_curve) * .1
   
   # Normal curve
   lines3d(x_curve, y_curve, z_curve, col="blue", lwd=3)
   
   # Mean line
   lines3d(c(x.val,x.val), c(mu,mu), c(0,max(z_curve)), lwd=2, col="green3")
}
 
# Add axes and grid
axes3d(edges=c("x--","y--"), col="black")
grid3d(c("z"), col="gray")

# Add axis labels
mtext3d("Weight", edge = "x--", line = 3)
mtext3d("MPG", edge = "y--", line = 3)
 
# Adjust the viewport
mat       <- par3d("userMatrix")
nmat      <- mat
nmat[2,4] <- mat[2,4] + 15  # Move 10 along y
 
# Apply the new matrix to shift the camera
par3d(userMatrix=nmat)
 
# Zoom in
par3d(zoom=0.45)

glX 
  5 

"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"
"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"
"no non-missing arguments to min; returning Inf"
"no non-missing arguments to max; returning -Inf"


In [6]:
rglwidget(width=772)

As another example, if $x_{i1}$ coded some form of *group membership* where $x_{i1} = 0$ was associated with all the data from the first group and $x_{i1} = 1$ was associated with all the data from the second group, we could have a variance function of the form

$$
\begin{align*}
    \text{Var}\left(y_{i}|x_{i1} = 0\right) &= \sigma^{2}_{1} \\
    \text{Var}\left(y_{i}|x_{i1} = 1\right) &= \sigma^{2}_{2}.
\end{align*}
$$

Here, each group is given its own variance term and thus we allow the width of the distribution to change depending upon the experimental grouping. This is a more useful variance function in theory, as it may be unreasonable to think that two distinct groups (such as patients and controls) would have the same degree of variability in measurements. However, this can be deceptively tricky to actually apply in practise. In addition, the variance function can also be used to define any *correlation* between different values of $y$, for instance, in cases of repeated measurements from the same subject. However, this is an additional complexity we will leave to one side until we reach the topic of *mixed-effects models* later in the course.

```{admonition} The Variance Function as the *Unpredictable* Element
:class: tip
Again, thinking back to our previous discussion, if the mean function captures the predictable element of our random variable, then the variance function captures the *unpredictable* element. In effect, the variance function seeks to capture the *structure* of the model errors. This involves capturing all the reasons why the data does not conform to the expected value. In the simplest case, we can just assume that there is a constant degree of non-conformity, irrespective of the value of any of the predictor variables. In other words, the variance function just produces a single value. In more advanced cases, the predictor variables can be used to dictate a more complex structure, including difference variances for different data groupings as well as different forms of correlational structure. 
```

`````{topic} What do you now know?
In this section, we have explored the concept of a *parametric statistical model* more formally, particularly in terms of the 3 core elements of a model. After reading this section, you should have a good sense of:

- The difference between *outcome* and *predictor* variables, as well as alternative terms for these concepts.
- The concept that a parametric statistical model can be fully described by its *population distribution*, *mean function* and *variance function*.
- The role of the *population distribution* is within a statistical model.
- The notion that we decide what expression to use for the mean function in order to generate an expected value of $y_{i}$ and, in doing so, dictate the form of the relationship between our outcome and predictors. 
- The notion that we decide what expression to use for the variance function in order to adjust the width of the population distribution and, in doing so, dictate how the width shifts across different values of the predictors. 
- The concept that different mean and variance functions produce *different models* and that our job, generally, is to find the mean and variance function that best fits our data.
`````

[^manip-foot]: The term "manipulation" is used very generally here. The experimental manipulation can either be entirely under the control of the experimenter (a so-called "true" experiment), or it may be naturally occurring. For instance, in the `mtcars` example, the weight of the cars was not directly manipulated, rather it was a naturally-occurring difference that was exploited for the purpose of the experiment. A similar example would be comparing patients and controls, as you cannot randomly assign people to either having a condition or not having a condition.

[^conditional-foot]: This can be directly compared with the programming concept of a *conditional statement*. We can think of the vertical bar as a way of writing an if-statement. For instance: "`if` $\mathbf{x}_{i}$ contains a specific set of values `then` the distribution of $y_{i}$ will take the following form".