## 12.1 Introduction

A parametric statistical model is an algebraic description of how one or more **outcome** variables are influenced by **covariates**. Such models are widely used in medical research. Some examples of questions that we can investigate using statistical models include:

   + Does birthweight increase with length of pregnancy?
   + Does taking drug A reduce inflammation more than taking drug B in patients with arthritis?
   + Can we predict the risk of heart disease for our patients? 
    
In the above examples, the outcome variables are birthweight, inflammation and heart disease. In the first two examples, the length of pregnancy and drug use are covariates. In the third example, no covariates are explicitly mentioned. However, when answering the third question, researchers may want to consider a range of patient characteristics that are associated with the risk of heart disease as covariates in their model, for example: diet, exercise, comorbodities, medications etc.  

Statistical models contain **population parameters** and representations of **uncertainty**. The population parameters are unknown quantities that we want to estimate from our sample and the uncertainty is a measure of the variability in the outcome variable that is not explained by the covariates. 

This is the first of two lessons on linear regression. In this lesson, we will learn how to define linear regression models, how to estimate their population parameters and how to estimate measures of uncertainty. We begin by introducing the **simple linear regression model** (Section 3) which includes one outcome and one covariate. We then introduce the **multivariable linear regression model** (Section 4) which is an extension of the simple linear regression model to situations with multiple covariates. In the final part of this lesson, we learn how to conduct an **analysis of variance** of statistical models (Section 5). In the next lesson we will discuss how statistical models can be used in the different types of investigation that were discussed last week, and how the type of investigation influences the presentation and interpretation of results obtained using statistical models. 

Before delving in, it is worth making a note of the different terminologies that you may come across in the medical literature. Here, I have already used the terms: outcome and covariates. Table 1 summarises alternatives terms that may be used to describe the same concepts. 

Outcome            | Covariates
-------------------| ---------
$Y$-variable       | $x$-variables
Dependent variable | Independent variable
Response variable  | Regressors
Output variable    | Input variables
                   | Explanatory variables
                   | Predictor variables

Table 1: Different terminology used for outcome and covariates

Finally, it is important to understand that statistical models make **assumptions** about the form of relationships between outcomes and covariates. Although we can examine our data to investigate the validity of these assumptions (using methods covered in the next lesson), we can never be certain that the model is correct. 



### 12.1.1 Data used in our examples 

For our examples we will use data on babies and their mothers. The data contains a random sample of 1,174 mothers and their newborn babies. The column Birth Weight contains the birth weight of the baby, in ounces; Gestational Days is the number of gestational days, that is, the number of days the baby was in the womb. There is also data on maternal age, maternal height, maternal pregnancy weight, and whether or not the mother was a smoker. 

The following code can be used to download and look at the data:


In [1]:
#Load data
data<- read.csv('https://www.inferentialthinking.com/data/baby.csv')

#Look at the first 10 rows of the data
head(data)


Birth.Weight,Gestational.Days,Maternal.Age,Maternal.Height,Maternal.Pregnancy.Weight,Maternal.Smoker
120,284,27,62,100,False
113,282,33,64,135,False
128,279,28,64,115,True
108,282,23,67,125,True
136,286,25,62,93,False
138,244,33,62,178,False
