# Regression Models with a Quantitative Explanatory Variable

## Chapter 9.1-9.4 Overview Notebook

In [None]:
# run this to set up the notebook
library(coursekata)
library(gridExtra)

# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles_v2.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# read in data
study_data <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQjgMnwBMvsmAj9rP2OccPLjGZZUON9ifFqDav0IDo-F1fUqBgpGXoBK0Lhmqf1IApfcjC4LGnW5iaZ/pub?output=csv") %>%
  mutate(active_cat = factor(ntile(active,3), levels = c(1:3), labels = c("low", "medium", "high")))


## 1 From Group Models to Regression

Up to now we have been building *group models* with categorical explanatory variables which you could think of like this: 

**Quantitative Outcome = Categorical Explanatory + Other Stuff**

In this notebook we introduce models traditionally referred to as *regression models* which have a quantitative explanatory variable like this: 

**Quantitative Outcome = Quantitative Explanatory + Other Stuff**

In most ways, they are exactly the same as group models, with just a few differences. Everything you know about fitting and interpreting group models will apply, as will the analysis of error around the model. Most important of all, the concept of **DATA = MODEL + ERROR** is exactly the same. So to dive into regression models, let's introduce a new dataset.

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-studying.png">

## 2 The `study_data` Dataset

How do you study when you have a test coming up? It turns out there are lots of ways to study. You can re-read the textbook, you can re-read your notes, you can re-write your notes, you can quiz yourself, you can explain key concepts to a friend, and so on. But a large body of research shows that not all methods of studying are equally effective. 

Walck-Shannon, Rowell & Frey (2021) did a study of college students reflecting on how they studied for a biology test. Before taking the biology test, students were given descriptions of various study techniques and asked, for each technique, to estimate what percentage of their study time they spent using each technique. Here is a partial list of the techniques they presented to students:

- Do practice problem sets
- Re-answer questions from an old exam
- Quiz yourself
- Explain concepts to yourself or others
- Synthesize notes	
- Make diagrams	
- Review outside content online
- Re-read notes you have taken in class	
- Watch lecture video	
- Re-read textbook	
- Re-write/copy notes	

Source: <a href="https://pubmed.ncbi.nlm.nih.gov/33444109/">Walck-Shannon, E. M., Rowell, S. F., & Frey, R. F. (2021). To what extent do study habits relate to performance?</a>

<div class="guided-notes">

### 2.1 Think of how you normally study for an exam. Estimate what percentage of your study time you spend in each activity. (Your numbers should add up to 100%.)
    
</div>


<div class="discussion-question">

### 2.2 Discussion Question: The researchers who collected data for this study categorized the first 6 strategies as *active* and the next 5 strategies as *passive*. Why do you think the first group of strategies are considered active? What makes the second group more passive?
</div>

<div class="guided-notes">
    
### 2.3 Add up your `active` percentage of study time by adding up your estimates for the first 6 study strategies.

What does this number say about how you tend to study? Is there a way you can make your studying more active?
    
</div>

### 2.4 About the data in `study_data` 

The `study_data` dataset comes from 60 college students taking a biology class. Each student reported their percentage of active study time just like we did. The dataset includes three variables:

- `active`: the percent of study time spent using active strategies
- `active_cat`: a categorical variable that indicates whether students are relatively `low`, `medium` or `high` in active study time (20 students in each group)
- `exam`: the percent correct on a biology exam

Use the code cell below to take a look at the data frame and examine the distributions of the three variables.

In [None]:
# code here

## 3 Exploring the Relationship of Exam Scores to Study Strategies

<div class="guided-notes">

### 3.1 What do you think the researchers' hypothesis was about the relationship between exam score and active studying? Write a word equation to represent their hypothesis. 

</div>


### 3.2 We have two explanatory variables that can be used to represent active study strategies (`active` and `active_cat`).  Use the code cell to visualize the relationship between `exam` and `active`, and between `exam` and `active_cat`.

What is the difference between these two explanatory variables?

In [None]:
# code here

<div class="discussion-question">

### 3.3 Discussion Question: Do you think the data support the researchers' hypothesis? What patterns do you notice in the graphs?
</div>

## 4 Reviewing the Three-Group Model

We already know how to create a three-group model. Because we have a categorical version of our study strategies variable (`active_cat`), we can start by creating the `group_model` of exam.

<div class="guided-notes">

### 4.1 Write the code you would use to fit the three-group model of `exam` based on `active_cat` (low, medium or high) in the second row of the table.
    
</div>


<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:20%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Three-Group Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Regression Model</td>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center;"><b>Relationship</b></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active_cat</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active</code></td>
    </tr>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>R to Fit Model</b></td>
        <td style="border: 1px solid black; text-align: left; vertical-align: center;"><code>group_model <- </code></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><code>regression_model <- </code></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>Visualization of Model</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-group-model-to-label.jpg" alt="group model"></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-regression-model-to-label.jpg" alt="regression model"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 80px"><b>Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active_cat, data = study_data)

Coefficients:
     (Intercept)  active_catmedium    active_cathigh  
           74.55              3.10              4.85</code></pre></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active, data = study_data)

Coefficients:
(Intercept)       active</code></pre></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 120px"><b>Interpretation of Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><ul>
              <li>$b_0$:
              <li>$b_1$:
              <li>$b_2$:
          </ul></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><ul>
              <li>$b_0$:
              <li>$b_1$:
          </ul></td>
    </tr>
  </tbody>  
</table>

### 4.2 Fit and save `group_model`; overlay the model on the graph; print out the model estimates

In [None]:
# modify this code

gf_jitter(exam ~ active_cat, data = study_data, width=.2)




<div class="guided-notes">
    
### 4.3 Three parameters were estimated in this group model ($b_0$, $b_1$, and $b_2$). Label where each parameter estimate is represented on the visualization in the third row, and write what each parameter estimate means on the bottom row of the table.

</div>

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:20%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Three-Group Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Regression Model</td>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center;"><b>Relationship</b></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active_cat</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active</code></td>
    </tr>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>R to Fit Model</b></td>
        <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>group_model <- lm(exam ~ active_cat, data = study_data)</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"> </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>Visualization of Model</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-group-model-to-label.jpg" alt="group model"></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-regression-model-to-label.jpg" alt="regression model"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 80px"><b>Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active_cat, data = study_data)

Coefficients:
     (Intercept)  active_catmedium    active_cathigh  
           74.55              3.10              4.85</code></pre></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active, data = study_data)

Coefficients:
(Intercept)       active</code></pre></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 120px"><b>Interpretation of Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><ul>
              <li>$b_0$:
              <li>$b_1$:
              <li>$b_2$:
          </ul></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><ul>
              <li>$b_0$:
              <li>$b_1$:
          </ul></td>
    </tr>
  </tbody>  
</table>

## 5 Fitting the Regression Model

Now let's see how we would fit and interpret a regression model. A regression model is similar to a group model you have already learned. There is still an outcome variable Y, and at least one X (an explanatory variable). The main difference is that now, the X is quantitive rather than categorical. 

In the group model, X indicated which group in a binary fashion (e.g., are you in the high active study group, X = 1, or not, X = 0). In a regression model, X represents a quantity (e.g., each students percentage of active study time). 

Try writing the model and R code yourself using the `Y ~ X` pattern you are already familiar with. But this time, instead of using `active_cat` as the explanatory variable, let's use `active`. You'll find that fitting a regression model in R is basically like fitting a group model!

<div class="guided-notes">
    
### 5.1 Write the code you would use to fit the regression model of `exam` based on `active` in the second row of the table. (Save the model as `regression_model`.)

</div>

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:20%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Three-Group Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Regression Model</td>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center;"><b>Relationship</b></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active_cat</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active</code></td>
    </tr>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>R to Fit Model</b></td>
        <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>group_model <- lm(exam ~ active_cat, data = study_data)</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"> </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>Visualization of Model</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-group-model-to-label.jpg" alt="group model"></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-regression-model-to-label.jpg" alt="regression model"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 80px"><b>Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active_cat, data = study_data)

Coefficients:
     (Intercept)  active_catmedium    active_cathigh  
           74.55              3.10              4.85</code></pre></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active, data = study_data)

Coefficients:
(Intercept)       active</code></pre></td>
    </tr>      
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 120px"><b>Interpretation of Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><ul>
              <li>$b_0$: mean of low (predicted exam when both Xs = 0)
              <li>$b_1$: adjustment added to intercept (mean of low) to get mean of medium
              <li>$b_2$: adjustment added to intercept (mean of low) to get mean of high
          </ul></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><ul>
              <li>$b_0$:
              <li>$b_1$:
          </ul></td>
    </tr>
  </tbody>  
</table>

### 5.2 Modify the code below to fit the regression model and save as `regression_model`; then overlay the model on the graph; and print out the parameter estimates.

In [None]:
# modify this code for a regression model
group_model <- lm(exam ~ active_cat, data = study_data)

gf_jitter(exam ~ active_cat, data = study_data, width=.2) %>%
  gf_model(group_model, color="firebrick")

group_model


<div class="guided-notes">
    
### 5.3 What are the best-fitting parameter estimates for the regression model? Fill out the R output shown in the regression column of the table. Label each estimate as either $b_0$ or $b_1$.

</div>

<div class="discussion-question">

### 5.4 Discussion Questions: Why does the regression model have fewer parameter estimates than the group model? Where do you see the regression model's parameter estimates ($b_0$ and $b_1$) in the visualization of the model?
</div>

<div class="guided-notes">

### 5.5 Label $b_0$ (the intercept) and $b_1$ (the slope) on the regression model visualization. 
    
</div>


<div class="guided-notes">

### 5.6 Write the interpretation of the two parameter estimates for the regression model in the last row of the table  

</div>


<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:20%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Three-Group Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Regression Model</td>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center;"><b>Relationship</b></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active_cat</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>exam ~ active</code></td>
    </tr>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>R to Fit Model</b></td>
        <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>group_model <- lm(exam ~ active_cat, data = study_data)</code></td>
      <td style="border: 1px solid black; text-align: center; vertical-align: center;"><code>regression_model <- lm(exam ~ active, data = study_data)</code></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>Visualization of Model</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-group-model-complete.jpg" alt="group model"></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-regression-model-complete.jpg" alt="regression model"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 80px"><b>Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active_cat, data = study_data)

Coefficients:
     (Intercept)  active_catmedium    active_cathigh  
           74.55              3.10              4.85</code></pre></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;font-size: 10px;"><pre><code>Call:
lm(formula = exam ~ active, data = study_data)

Coefficients:
(Intercept)       active  
    70.6955       0.1293</code></pre></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top; height: 120px"><b>Interpretation of Parameter Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">
          <ul>
              <li>$b_0$: mean of low (predicted exam score when both Xs = 0)
              <li>$b_1$: adjustment added to intercept (mean of low) to get mean of medium
              <li>$b_2$: adjustment added to intercept (mean of low) to get mean of high
          </ul></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">
          <ul>
              <li>$b_0$: predicted exam score when active = 0
              <li>$b_1$: adjustment added to intercept for each percentage of active
          </ul></td>
    </tr>
  </tbody>  
</table>

## 6 How the Regression Model Makes Predictions

Like all statistical models, a regression model is a function that uses the value of X (the explanatory variable) to generate a prediction of Y (the outcome variable). We can use the regression model to generated a predicted Y for every row in our dataset, based on each row’s value of X.

We have already expressed the regression model with R code. Now let's represent it using GLM notation so we can see how it generates predictions.

In the table below, we have written in three versions of the three-group model. Use that information to help you fill in the corresponding versions for the regression model: 
1. GLM notation; 
2. Model with variable names (instead of $Y$s and $X$s); 
3. Model with both variable names and parameter estimates (instead of $b$s).


<div class="guided-notes">

### 6.1 Write all three versions of the regression model in the table below   

</div>


<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:14%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:43%">Three-Group Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:43%">Regression Model</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>GLM Notation</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;">$$Y_i=b_0+b_1X_{1i}+b_2X_{2i}+e_i$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"> </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>With Variable Names</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;">$$\text{exam}_i=b_0+b_1\text{active_catmedium}_{i}+b_2\text{active_cathigh}_{i}+e_i$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"> </td>
    </tr>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: center; height: 80px"><b>With Estimates</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;">$$\text{exam}_i=74.5+2.2\text{active_catmedium}_{i}+4.6\text{active_cathigh}_{i}+e_i$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: center;"> </td>
    </tr>
  </tbody>  
</table>

<div class="discussion-question">

### 6.2 Discussion Question: What does $X_1$ represent in the three-group model? What does $X$ represent in the regression model? How are they different? How are they the same? 
</div>

<div class="guided-notes">

### 6.3 Use the best-fitting regression equation ($\hat{Y}_i = 70.7 + 0.13X_i$) to fill in the empty cells in the data table below  

</div>


<table border="1" style="font-size: 16px; border-collapse: collapse; width: 100%; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid black; text-align: center; width:14%">student</th>
      <th style="border: 1px solid black; text-align: center; width:14%">active</th>
      <th style="border: 1px solid black; text-align: center; width:14%">exam</th>
      <th style="border: 1px solid black; text-align: center;">prediction (equation form)</th>
      <th style="border: 1px solid black; text-align: center;">predicted exam score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: center;">A</td>
      <td style="border: 1px solid black; text-align: center;">0</td>
      <td style="border: 1px solid black; text-align: center;">51</td>
      <td style="border: 1px solid black; text-align: center;">  </td>
      <td style="border: 1px solid black; text-align: center;">  </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: center;">B</td>
      <td style="border: 1px solid black; text-align: center;">10</td>
      <td style="border: 1px solid black; text-align: center;">74</td>
      <td style="border: 1px solid black; text-align: center;">70.7 + 0.13*10</td>
      <td style="border: 1px solid black; text-align: center;">72.0</td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: center;">C</td>
      <td style="border: 1px solid black; text-align: center;">100</td>
      <td style="border: 1px solid black; text-align: center;">83</td>
      <td style="border: 1px solid black; text-align: center;">  </td>
      <td style="border: 1px solid black; text-align: center;">  </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: center;">D</td>
      <td style="border: 1px solid black; text-align: center;"> </td>
      <td style="border: 1px solid black; text-align: center;">78</td>
      <td style="border: 1px solid black; text-align: center;">70.7 + 0.13*(__)</td>
      <td style="border: 1px solid black; text-align: center;">72.0</td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: center;">E</td>
      <td style="border: 1px solid black; text-align: center;">50</td>
      <td style="border: 1px solid black; text-align: center;">  </td>
      <td style="border: 1px solid black; text-align: center;">70.7 + 0.13*50</td>
      <td style="border: 1px solid black; text-align: center;">77.2</td>
    </tr>
  </tbody>
</table>


### 6.4 Run the code below to create a new column in the data frame for the `group_model` predictions and then overlay these predictions on the plot of `exam` by `active`.

In [None]:
# add predictions from group model to study_data
study_data$group_predict <- predict(group_model)

# graph the group predictions
gf_point(exam ~ active, data = study_data) %>%
  gf_point(group_predict ~ active, data = study_data, color="blue", shape = 1)


<div class="guided-notes">
    
### 6.5 Write code below to create a new column in the data frame for the `regression_model` predictions and then overlay these predictions on the plot of `exam` by `active`, coloring the regression model predictions "red".
    
</div>

In [None]:
# modify this to add predictions from regression model to study_data
study_data$regression_predict <- predict()

# modify this to graph the regression predictions
gf_point(exam ~ active, data = study_data)

<table border="1" style="font-size: 16px; border-collapse: collapse; width: 100%; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid black; text-align: center; width:50%">Three-Group Model</th>
      <th style="border: 1px solid black; text-align: center; width:50%">Regression Model</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; font-size: 14px;"><code>study_data$group_predict <- predict(group_model)</code></td>
      <td style="border: 1px solid black; text-align: left; font-size: 14px;"> </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; font-size: 14px;"><code>gf_point(exam ~ active, data = study_data) %>%
  gf_point(group_predict ~ active, color="blue")</code> </td>
      <td style="border: 1px solid black; text-align: left; font-size: 14px;"> </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; font-size: 14px;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-group-model-predictions.jpg" alt="group model predictions as dots"></td>
      <td style="border: 1px solid black; text-align: left; font-size: 14px;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/9.1-9.4-regression-model-predictions.jpg" alt="regression model predictions as dots"></td>
    </tr>
  </tbody>
</table>


<div class="discussion-question">

### 6.6 Discussion Questions: There are data from 60 students in this dataset.  
    
- How many model predictions (colored dots) are there for each model? How do you know?  
- What pattern do you see the **blue dots** (group model predictions)?  
- What pattern do you see the **red dots** (regression model predictions)?  
- What do these patterns tell you about the difference between the two types of models?

</div>

<div class="discussion-question">

### 6.7 Discussion Questions: Are the active study strategies worth it? How might you adjust your study strategies?

</div>