# **Week 11: Simple Linear Regression**

```
.------------------------------------.
|   __  ____  ______  _  ___ _____   |
|  |  \/  \ \/ / __ )/ |/ _ \___  |  |
|  | |\/| |\  /|  _ \| | | | | / /   |
|  | |  | |/  \| |_) | | |_| |/ /    |
|  |_|  |_/_/\_\____/|_|\___//_/     |
'------------------------------------'

```

In this workshop, we will explore how to perform linear regression in R through practical exercises.

## **Pre-Configurating the Notebook**

### **Switching to the R Kernel on Colab**

By default, Google Colab uses Python as its programming language. To use R instead, you’ll need to manually switch the kernel by going to **Runtime > Change runtime type**, and selecting R as the kernel. This allows you to run R code in the Colab environment.

However, our notebook is already configured to use R by default. Unless something goes wrong, you shouldn’t need to manually change runtime type.

### **Importing Required Packages**
**Run the following lines of code**:

In [None]:
#Do not modify

setwd("/content")

# Remove `MXB107-Notebooks` if exists,
if (dir.exists("MXB107-Notebooks")) {
  system("rm -rf MXB107-Notebooks")
}

# Fork the repository
system("git clone https://github.com/edelweiss611428/MXB107-Notebooks.git")

# Change working directory to "MXB107-Notebooks"
setwd("MXB107-Notebooks")

#
invisible(source("R/preConfigurated.R"))

**Do not modify the following**

In [None]:
if (!require("testthat")) install.packages("testthat"); library("testthat")

test_that("Test if all packages have been loaded", {

  expect_true(all(c("ggplot2", "tidyr", "dplyr", "stringr", "magrittr", "knitr") %in% loadedNamespaces()))

})

## **Simple Linear Regression Model**

We have already introduced simple linear regression in the bivariate data summary workshop/lecture. This workshop content will go a bit deeper and focus on:

- Performing hypothesis tests for regression parameters
- Interpreting key quantities from `lm()` model outputs
- Computing confidence intervals and prediction intervals
- Interpreting diagnostic plots to assess model fit


### **Fitting Simple Linear Regression Models in R**

R provides the `lm()` function for fitting linear regression models. It has a formula interface, similar to `aov()`, `t.test()`, and other modeling functions in R. Linear models are very flexible and can be used for simple regression, multiple regression, and even ANOVA (since ANOVA is a special case of a linear model).

**Usage:**

```r
lm(formula,
   data = NULL,
   subset = NULL,
   weights = NULL,
   na.action = na.omit,
   ...)
```

**Arguments:**

- `formula`: a model formula of the form `response ~ predictors` (e.g., `y ~ x` for simple linear regression)  
- `data`: a data frame containing the variables in the model  
- `subset`: an optional vector specifying a subset of observations to be used  
- `weights`: an optional vector of weights for weighted regression  
- `na.action`: a function that indicates what should happen when the data contain `NA`s (default is `na.omit`)  
- `...`: additional arguments passed to lower-level modeling functions  



#### **Question 1.1**

Use the `aov()` function to test for the effects of `wool`, `tension`, and their interaction on the number of breaks.  Interpret the ANOVA table.



In [None]:
aov(breaks ~ wool * tension, data = warpbreaks) %>% summary()

# Main effect of wool has p-value ≈ 0.058. At 95% significance level, there is insufficient evidence against the null hypothesis that wool type has no effect on the number of breaks, after allowing for tension.
#                                          However, we could interpret this as some/slight evidence against H0 if we are not willing to stick to a strict significance level
# Main effect of tension has p-value ≈ 0.0007 < 0.05. Different tension levels significantly affect the number of breaks, after allowing for wool type.
# Interaction effect (wool:tension): p ≈ 0.02 < 0.05. Evidence against the null hypothesis in favour of the hypothesis that the effect of tension depends on the wool type (and vice versa).

# Of course, one may also compare F statistics to critical values

F_wool = 3.765
F_tension = 8.498
F_interaction = 4.189

F_wool > qf(0.95, 1,48)
F_tension > qf(0.95, 2,48)
F_interaction > qf(0.95, 2,48)


#### **Question 1.2**

Use an appropriate post-hoc method to perform pairwise comparisons between levels of `tension`. Interpret the results to identify significant differences. Comment on the directions of the differences.

In [None]:
aov(breaks ~ wool * tension, data = warpbreaks) %>% TukeyHSD()

# M vs L: CI does not include 0. There is evidence that medium tension produces fewer breaks than low tension.
# H vs L: CI does not include 0. There is evidence that high tension produces fewer breaks than low tension.
# H vs M: CI does include 0. No evidence of difference between high and medium tension.

# all of these should be interpreted as accounting or allowing for having wool type in the model

# Technically, if a symmetric two-sided test is rejected, then the corresponding one-sided test in the same direction as the observed difference (`diff`) would also be rejected.
# So, while TukeyHSD formally only performs two-sided tests for pairwise differences, it is still valid to **describe the observed direction of the difference** when reporting results.


### **Question 2**

The `ToothGrowth` dataset contains the tooth length of 60 guinea pigs. Each animal received one of three dose levels of Vitamin C (0.5, 1, or 2 mg/day) with one of two delivery methods: orange juice (OJ) or ascorbic acid (VC).


In [None]:
ToothGrowth %>% str()

#### **Question 2.1**

Fit a two-way ANOVA model using the `aov()` function to test for the effects of supplement and dose level, and their interaction on tooth length. Interpret the ANOVA table.


<details>
<summary>▶️ Click to show the solution</summary>
Solution will be released at the end of the week!

</details>

#### **Question 2.2**


Use an appropriate post-hoc method to perform pairwise comparisons between dose levels. Interpret the results to identify significant differences. Comment on the directions of the differences.


<details>
<summary>▶️ Click to show the solution</summary>
Solution will be released at the end of the week!

</details>