## 13.12 Example using the birthweight data

As an example of the ideas discussed above, we will return to the multivariable linear regression model we have previous defined, relating birthweight to length of pregnancy and mother's height. We will consider how this model can be presented and interpreted differently, depending on the aims of the analysis. 

Recall the model is defined as:

$$\text{Model 3: } y_i = \beta_0 + \beta_1 l_i + \beta_2h_i +  \epsilon_i $$

A DAG depicting the assumed relationships between the variables in the model, and R output summarising the results of the model are given below. 

```{figure} Images/ExampleDAG.png
---
height: 600px
name: ExampleDAG
---
```

In [1]:
summary(model3)
confint(model3, parm=c(2,3), level=0.95)

ERROR: Error in summary(model3): object 'model3' not found


### 13.12.1 Analysis of risk factors

Suppose our aim was to explore length of pregnancy and mother's height as potential "risk factors" of birthweight. Then we would interpret these results as follows: 

After adjusting for mother's height, a daily increase in length of pregnancy was associated with an increase of 0.45 (0.39-0.51) ounces in mean birthweight. After adjusting for length of pregnancy, an increase of one inch the mother's height was associated with an increase of 1.28 (0.90-1.65) ounces in mean birthweight. 

Since 0 is not included in the 95\% confidence intervals for height or length of pregnancy, we can conclude that there is evidence of conditional associations between mother's height and birthweight, and length of pregnancy and birthweight. 

### 13.12.2 Prediction analysis

Now suppose our aim was to predict the birthweight of future babies, using information on their mother's height and the length of pregnancy. We are now less interested in the estimated regression coefficients and more interested in the predicted values. For example, suppose we wanted to predict the birthweight of a baby whose mother was 66 inches and whose pregnancy lasted 200 days. We would obtain the relevant predicted value and its 95\% confidence interval: 

In [9]:
new.data<-data.frame(Gestational.Days=200,Maternal.Height=66)
predict(model3, newdata=new.data, interval="confidence", level=0.95)

fit,lwr,upr
86.16838,81.30474,91.03201


Or, we may wish to obtain the relevant prediction interval:

In [10]:
predict(model3, newdata=new.data, interval="prediction", level=0.95)

fit,lwr,upr
86.16838,53.54902,118.7877


Based on the above results, we would predict that a baby whose mother was 66 inches tall and whose pregancy lasted 200 days would weigh somewhere between 81.30 and 91.03 ounces. Additionally, we estimate that 95\% of babies whose mother was 66 inches tall and whose pregancy lasted 200 days would weigh between 53.55 and 118.79 ounces. 

It's important to also present statistics indicating the predictative performance of the model. In this case, $R^2=0.1969$, indicating that the model can only account for 19.7\% of the total variation in the outcome. 

### 13.12.3 Causal inference

Finally, suppose our aim was to estimate the causal effect of length of pregnancy on birthweight. As is shown in the DAG, we assumed that mother's height was a common cause of length of pregnancy and birthweight and therefore needed to adjust for it in the analysis to remove confounding bias. 

From the output above, we would report $\hat{\beta_1}=0.45$ and the 95\% confidence interval: (0.39, 0.51). Based on these results, we can conclude that length of pregnancy does have a causal effect on birthweight (since 0 does not lie within the confidence interval). However, the validity of these findings rely on the assumption that there are **no unmeasured variables causing confounding bias** (which is a very strong assumption and can be difficult to justify).

It is important to understand that we cannot interpret $\hat{\beta_2}$ in the same way as $\hat{\beta_1}$ in this analysis. We interpreted $\beta_1$ causally, because mother's height was the only confounding variable in the association between length of pregnancy and birthweight, and we controlled for it in the analysis. However, length of pregnancy is not a confounding variable for the association between mother's height and birthweight. In fact, length of pregnancy lies on the causal pathway between mother's height and birthweight, which means that some of the effect of mother's height on birthweight can be explained through the length of pregnancy. According to our DAG, the **total effect** of mother's height on birthweight comprises of two parts: (1) the **direct effect** (denoted by the path between height and birthweight) and (2) the **indirect effect** (denoted by the causal pathway that runs through length of pregnancy). Since we have adjusted for length of pregnancy in our analysis, we have controlled for the indirect effect of mother's height on birthweight and therefore $\hat{\beta}_2$ represents the direct effect only. 

It is a common mistake in medical research to interpret all the estimated regression coefficients from a multivariable model in the same way, but as our example has shown, this can be misleading. This problem is known in the literature as the **Table 2 fallacy**. 
