# Introduction {-}

In this tutorial, we will learn to how to perform multiple linear regression.

**Preparation and session set up**

Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).


In [None]:
# install packages
install.packages("here")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("glmulti")
install.packages("car")
install.packages("rms")
install.packages("sjPlot")
install.packages("report")
install.packages("ggcorrplot")


Now that we have installed the packages, we activate them as shown below.



In [None]:
# activate packages
library(here)
library(dplyr)
library(ggplot2)
library(glmulti)
library(car)
library(rms)
library(sjPlot)
library(report)
library(ggcorrplot)


#  Tutorial Activity {-}

Go into groups - each group and help each other to bring the data into the correct format, visualize the data and perform the multiple regression.

# Task 1

Multiple linear regression is a better alternative for ANOVAs (because, in addition to categorical indep. variables, it can handle all types of indep. variables).

We go back to the data sets we have analyzed in week 6. Here, the RQ is if the courses differ in how satisfied students were with the courses and if their SeCATs are affected by the grade they got. Satisfaction is operationalized as secat scores, the *grade* a student received is categorized as high, medium, and low, and the course a student attended is provided in the *class* column.

Load the data set `week6g1.xlsx`. Visualize the data and perform a full regression analysis. 


In [None]:
dat1 <- readxl::read_excel(here::here("data", "week6g1.xlsx"))
# inspect
head(dat1)


Visualize data



In [None]:
dat1  %>%
  dplyr::mutate_if(is.character, factor) %>%
  ggplot(aes(class, secat, fill = class)) +
  geom_boxplot() +
  facet_grid(~grade)


Fitting a model



In [None]:
m1 <- lm(secat ~ class * grade, data = dat1)
# inspect results
summary(m1)


Model fitting



In [None]:
mfit <- glmulti(secat ~ class * grade, data = dat1, crit = aic)
# isnepct results
summary(mfit)


Define final minimal adequate model



In [None]:
m1 <- lm(secat ~ class + grade, data = dat1)
# inspect results
summary(m1)


Diagnostics: outliers?



In [None]:
plot(m1)



Check multicollinearity



In [None]:
rms::vif(m1)



Effects



In [None]:
sjPlot::plot_model(m1, type = "pred", terms = c("grade", "class"))



Summarize



In [None]:
sjPlot::tab_model(m1)



Report



In [None]:
report::report(m1)



# Task 2

We are now having a look at a new data set. This data represents  a study where language learners were asked to press a button if a word shown on a computer screen was a real word (like *dough* or *missed*). In addition, the learners completed an IQ test and a language proficiency test as well as stated how many hours they slept before doing the experiment. 

RQ: What factors impact the reaction time with which learners identify a real word in a foreign language.

H1: All predictors impact the reaction time.

Load the data set `week10d2.xlsx`. Visualize the data and perform a full regression analysis. 


In [None]:
dat2 <- readxl::read_excel(here::here("data", "week10d2.xlsx"))
# inspect
head(dat2)


Visualize data



In [None]:
corr <- round(cor(dat2), 1)
ggcorrplot(corr)


In [None]:
dat2 %>%
  ggplot(aes(Proficiency, ReactionTime, color = IQ)) +
  geom_point()
  


Fitting a model



In [None]:
m2 <- lm(ReactionTime ~ Proficiency * IQ * Sleep, data = dat2)
# inspect results
summary(m2)


Model fitting



In [None]:
mfit <- glmulti(ReactionTime ~ Proficiency * IQ * Sleep, data = dat2, crit = bic)
# extract best models
top <- weightable(mfit)
top <- top[1:5,]
# inspect top 5 models
top


Define final minimal adequate model



In [None]:
m2 <- lm(ReactionTime ~ Proficiency + IQ + Sleep, 
         data = dat2)
# inspect results
summary(m2)


Diagnostics: outliers?



In [None]:
plot(m2)



Check multicollinearity



In [None]:
car::vif(m2)



Effects



In [None]:
sjPlot::plot_model(m2, type = "pred", terms = c("Proficiency", "IQ"))



Summarize



In [None]:
sjPlot::tab_model(m2)



Report



In [None]:
report::report(m2)



# Outro



In [None]:
sessionInfo()

