# Measuring Error From the Two-Group Model (COMPLETE)
## Chapter 7.5-7.8 Overview Notebook

In [None]:
# run this to set up the notebook
suppressMessages(library(coursekata))
suppressMessages(library(gridExtra))

# format notebook
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# temporarily add gf_resid and gf_squaresid
source("https://raw.githubusercontent.com/UCLATALL/stopwatch/refs/heads/main/gf_resid.R")
source("https://raw.githubusercontent.com/UCLATALL/stopwatch/refs/heads/main/gf_square_resid.R")

# get the data
anchor_data <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vR8CPolRTYxJ6eCszgpgcGvuQ6tyFNoraBbFrFXEbKFgZgouLAlrgEC6wyRqWwSZPTmS2Xpd09P6G9y/pub?gid=1344310652&single=true&output=csv")

# create models
empty_model <- lm(guess ~ NULL, data = anchor_data)
condition_model <- lm(guess ~ condition, data = anchor_data)

<div class="teacher-note">
    <b>Teacher Note:</b> In this section, students will refine their concept of what it means to “explain” variation. Whereas previously they could see in a graph that knowing an observation’s value on the explanatory variable could help them make a better guess as to its value on the outcome, now they can quantify how much better the guess would be.

- Students will understand error around model predictions is an indicator of how well a model fits the data, and they will expand their concept of error from the empty model to the two-group model, understanding that for both models:
    - Error can be understood as based on residuals from model predictions (resid = data - model prediction); 
    - Residuals can be visualized as the distance between the model prediction and the data point;
    - Residuals can be calculated for each observation using resid() function
    - Residuals perfectly balance each other out (resid are neg and pos; sum to 0); 
    - Residuals can be squared and summed up (now called SSE; SST only if from the empty model) to get an aggregate measure of error
- Students will understand that by comparing error from the two-group model to error from the empty model, they can calculate the value of the two-group model over the empty model in terms of improved model predictions: 
    - The two-group model generally reduces error compared to the empty model; that is SSE usually &lt; SST, except in the special case where the two group means are exactly the same.
    - Reduction in error depends on difference between the two models' predictions – if the model predictions are similar or the same (e.g., grand mean is same as the means of the groups; if the means of two groups are similar), then no reduction in error; if model predictions are really different (e.g., the means of two groups are very different), then a greater reduction in error

A <a href="https://docs.google.com/document/d/1AqRiIsPZjJOoprTXwlBcxZ-3m9cD3p-zLnpefp_yAc4/edit?tab=t.5y2a0ykmi2fk#heading=h.wjaasjj3pg90" target="_blank">printable student guided-notes worksheet</a> is available to go with this Jupyter notebook, as well as a student version of this notebook.
</div>

## 1 Review: The `anchor_data` dataset

In this notebook we will go back to our `anchor_data` dataset from the previous notebook. Remember, this dataset contained results from an experiment in which students were shown a card like this one and told not to turn it over until they were instructed to do so.

<br><div style="margin: 0 auto; font-family: Arial, sans-serif; border: 2px solid black; width: 380px; height: 220px; background-color: #F2F2F2; padding: 10px; box-sizing: border-box; font-size: 12pt; line-height: 1.4; font-weight: bold; text-align: left;">
  On the other side of this card is a math problem. When you are told to do so, turn over the card and look at it for 5 seconds. Then turn the card back over and write your estimate of the answer in the space below.<br><br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;___________________________
</div>

Once they turned it over they were asked to look at it for 5 seconds, then turn the card back over and estimate the product of the 8 numbers. All students got the same 8 numbers (so the correct answer was the same). But half the students got the numbers in ascending order (on the left), the other half in descending order (on the right).

<table border="1" style="font-size: 24px; margin-left: 0; border-collapse: collapse; table-layout: fixed; width: 100%;">
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: center; padding: 60px;">
        1 × 2 × 3 × 4 × 5 × 6 × 7 × 8
      </td>
      <td style="border: 1px solid black; text-align: center; padding: 60px;">
        8 × 7 × 6 × 5 × 4 × 3 × 2 × 1
      </td>
    </tr>
  </tbody>
</table>

Students' guesses were stored in a dataset called `anchor_data`. Each row represents one student. The columns include which `condition` they were in and what their `guess` was.

## 2 Measuring Error From a Two-Group Model

In the previous notebook we created two models of `guess`:

- the `empty_model`, which makes the same prediction (the grand mean) for everyone, and  
- the `condition_model` (a two-group model), which makes different predictions for ascending vs. descending groups (using the mean for each group as the model prediction).

We previously learned how to measure error from the empty model. In this notebook we will learn how to measure error from the two-group model.

### 2.1 To start, let's fit and save two models of `guess`

Use the code cell below to review what's in the `anchor_data` dataset. Then fit and save two models of `guess`: `empty_model` (using the mean as the model), and `condition_model` (using the mean of each condition as the model).

In [None]:
# view contents of dataset
str(anchor_data)

# create and save models


# print the models


#COMPLETE

# view contents of dataset
str(anchor_data)

# create and save models
empty_model <- lm(guess ~ NULL, data = anchor_data)
condition_model <- lm(guess ~ condition, data = anchor_data)

# print the models
empty_model
condition_model

### 2.2 Run the code below to show the effect of `condition` on `guess`. Write code to overlay both of the models we just created on the same graph.

In [None]:
# code here
gf_jitter(guess ~ condition, data = anchor_data, width = .2) 

# COMPLETE
# code here
gf_jitter(guess ~ condition, data = anchor_data, width = .2) %>%
gf_model(empty_model) %>%
gf_model(condition_model)

<div class="discussion-question">

### 2.3 Discussion Questions:
- Where in the graph is the empty model?
- Where is the group model?
- How is it possible to have two different models of the same data?
    
</div>

<div class="teacher-note">
    
**Sample Responses:**
- The empty model is the horizontal line at around 2000 (more precisely 1946, the mean)
- The group model is the set of black horizontal lines at around 800 and 3000 (more precisely 786.1 and 3043, the group means)
- We specified and fit two different models -- the empty model and the group model. The same data can be used to create two different models (in this case one more complex than the other).

</div>


### 2.4 The graphs below show the same data with the empty model on the left, the condition model on the right. We have highlighted the same few data points in each graph.

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/7.5-7.7-overview-empty-cond-models.jpg">

<div class="guided-notes">

### 2.5 Draw in the residuals from the empty model on the left. Then try drawing in residuals from the condition model.  

Our measure of error from any model starts with residuals. Just draw the residuals for the black points and imagine the rest.
    
</div>


<div class="guided-notes">

### 2.6 We have drawn in some residuals by hand. We can also overlay residuals on a jitter plot using the R function `gf_resid()`. Write R code to add the model and residuals to the plot.
    
</div>

In [None]:
# modify this
#gf_jitter(guess ~ condition, data = anchor_data) %>%
#  gf_model(_) %>% 
#  gf_resid(_) 

# COMPLETE
gf_jitter(guess ~ condition, data = anchor_data) %>%
  gf_model(empty_model) %>% 
  gf_resid(empty_model) 

gf_jitter(guess ~ condition, data = anchor_data) %>%
  gf_model(condition_model, color = "firebrick", width=1) %>% 
  gf_resid(condition_model, color = "firebrick") 


<div class="discussion-question">

### 2.7 Discussion Question: A residual from the empty model is defined as the difference between the data point and the mean. How do you think we would define the residual from the condition model?

</div>

<div class="teacher-note">

**Teacher Note:** Residuals for any model are calculated as the data point minus the model prediction for that data point. We want to make sure students understand that it is not just calculated from the mean (which it is for the empty model) but that it is calculated from whatever the model prediction is.

</div>


<div class="guided-notes">

### 2.8 Write a definition of *residual* that would work for any model.
    
</div>

## 3 Representing Residuals in GLM Notation

We have learned how to represent both the empty and two-group models in GLM notation. Let's pause to zero in on error and residuals in particular. How do we represent residuals in the notation of the GLM?

For all models, **DATA = MODEL + ERROR**. This being the case, we also can say that **ERROR = DATA - MODEL**. The residual for any data point can be calculated as the value of the DATA minus the MODEL prediction for that data point. 


<div class="guided-notes">

### 3.1 We have filled in the table below for the empty model. Fill in the corresponding information for the two-group model.  

</div>


<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:28%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:36%">Empty Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:36%">Two-Group Model</td>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top;"><b>Residual definition</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">The difference between a data point and the mean (the empty model's prediction)</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">The difference between a data point and any model's prediction</td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top;"><b>Model specification</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">$$\underbrace{Y_i}_{\text{DATA}} \;=\; \underbrace{b_0}_{\text{MODEL}} \;+\; \underbrace{e_i}_{\text{ERROR}}$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">$$\underbrace{Y_i}_{\text{DATA}} \;=\; \underbrace{(b_0 + b_1X_i)}_{\text{MODEL}} \;+\; \underbrace{e_i}_{\text{ERROR}}$$</td>
    </tr>
    <tr>
        <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top;"><b>Residual in GLM notation</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">$$e_i = Y_i - b_0$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"></td>
    </tr>
    <tr>
       <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top;"><b>Model prediction in GLM notation</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">$$\hat{Y}_i = b_0$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"></td>
    </tr>
    <tr>
       <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top;"><b>Residual in GLM notation (substituting model prediction for model)</b></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;">$$e_i = Y_i - \hat{Y}_i$$</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"></td>
    </tr>
    </tr>
  </tbody>  
</table>

## 4 Using Total Error to Compare Two Models

We have calculated total error from the empty model using SST (Sum of Squares Total). How do we calculate total error from the condition model (a two-group model)? And how do we use these calculations to compare the two models?

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/7.5-7.7-residuals-from-two-models.png">

<div class="discussion-question">

### 4.1 Discussion Questions: 
- Just by looking at the residuals in the jitter plot, which model seems to have less error overall? Explain your answer.
- Do you see positive and negative residuals in both plots? Why do you think that happens?
- If we add up all the residuals from the empty model, and then add up all the residuals from the group model, which do you think will be smaller?  Why?
</div>

<div class="teacher-note">
    
**Sample Responses:**
- The empty model has longer residual lines overall compared to the group model, which means more error.
    - In the ascending group, many points fall below the empty model line, and those residuals are longer than the ones around the group model.
    - In the descending group, many points fall above the empty model line, and those lines are also longer than the residuals around the group model.
    - In the group model, the points in each group are balanced above and below each group's prediction so there are shorter and more balanced residuals.
- Yes, there are positive and negative residuals in both plots. Both the empty model and the group model predict the mean (either the grand mean or the group means), so residuals are naturally balanced around those means.
- Because the residuals come from the mean (either the grand mean or group means), the sum of residuals is approximately zero for both models.

</div>


### 4.2 Do residuals from the two-group model also add up to 0 (i.e., are they perfectly balanced)?

We learned that the residuals from the empty model always add up to 0. But is that also true for other models, such as the condition model? 

<div class="guided-notes">
    
### 4.3 Write R code to sum the residuals from the empty model and from the condition model.

</div>

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width: 16%"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width: 42%">Empty Model</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width: 42%">Group Model</td>
    </tr>
  </thead>
  <tbody>
    <tr style="height: 80px;">
      <td style="border: 1px solid black; font-weight: bold; text-align: left">R code to sum residuals</td>
      <td style="border: 1px solid black; text-align: left; width: 50%; vertical-align: top;"><br><br><code>________ ( resid(empty_model) )</code></td>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><br> </td>
    </tr>
    <tr style="height: 80px;">
      <td style="border: 1px solid black; font-weight: bold; text-align: left">Sum of residuals</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center"></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center;"></td>
    </tr>
  </tbody>  
</table>

In [None]:
# modify this code
# resid(empty_model)

#COMPLETE
sum(resid(empty_model))
sum(resid(condition_model))

<div class="teacher-note">

**Teacher Note:** Residuals around any mean (grand mean or group mean) always balance out. Their sum is exactly 0. That's why both the empty model and the group model have residuals that "cancel out." As we will see later, the same is true for regression models. 
    
As with the empty model, we'll need a different strategy to get total error: squaring and summing the residuals.  

</div>

### 4.4 Using squared residuals to compare two models

So far we’ve seen that residuals always balance around a mean whether it’s the grand mean (empty model) or each group mean (condition model).  

That means if we just add up the residuals, both models look the same: the sum is always 0.  

To compare which model generates better predictions of the data, we need a better measure of **total error** which brings us to the concept of **sum of squares** (SS): square each residual, then add them up.

We have already done this for the empty model; the sum of squared residuals from the empty model is called **SST**, or Sum of Squares Total. When we calculate sum of squares for the condition model (or any model other than the empty model) we call the result **Sum of Squares Error**, or **SSE**. 

<div class="guided-notes">

### 4.5 Draw in the squared residuals for the empty model and the group model.
    
Just draw the squared residuals for the black points and imagine the rest.
    
</div>


<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/7.5-7.7-overview-empty-cond-models.jpg">

<div class="guided-notes">

### 4.6 Write the R code to calculate the SS from the empty model and the group model
    
- The sum of squares from the empty model is called **SST** (Sum of Squares Total)
- The sum of squares from the group model is called **SSE** (Sum of Squares Error)  
    
</div>


In [None]:
# modify this
#resid(empty_model)

# COMPLETE
sum(resid(empty_model)^2)
sum(resid(condition_model)^2)


### 4.7 Make a prediction: Which SS will print out when we run `supernova(condition_model)`?

Then run the code below.

In [None]:
# run this
supernova(condition_model)

<div class="discussion-question">

### 4.8 Discussion Questions: 
    
- Which SS is smaller?
- What does that mean for our two models?
- What does the SS in the **Model (error reduced)** row mean? How was it calculated? (We will go into this more in the next notebook.)

</div>

<div class="teacher-note">

**Sample Responses:**  
- SSE is smaller than SST. 
- SS from the condition model is smaller than SS from the empty model.
- This means the condition model has less squared error; less error means predictions that were closer to the data.
    
**Teacher Note:**  
- We want students to understand that these phrases go together: Smaller sum of squares --> less error --> better fit to the data.
- Because some of the total error is "reduced" in the group model compared to the empty model, we say it has "explained" some of the variation in the data.

</div>


<div class="discussion-question">

### 4.9 Discussion: For both models, the formula for **sum of squares** is: $SS = \sum (Y_i - \hat{Y}_i)^2$
    
- What is the $\hat{Y}_i$ in the empty model?
- What is the $\hat{Y}_i$ in the group model?    

</div>


<div class="teacher-note">

**Sample Responses:**  
- The $\hat{Y}_i$ in the empty model is the prediction (the grand mean) for `guess` regardless of condition.
- The $\hat{Y}_i$ in the group model is the prediction for each group (the mean for ascending, and the mean for descending).


</div>


<div class="discussion-question">

### 4.10 Discussion: What makes it hard to use sum of squares (SSE and SST) to compare the two models?

</div>

<div class="teacher-note">

**Sample Responses:** 
- SS numbers were huge (in the hundreds of millions)
- The units were strange (seconds squared).  

**Teacher Note:**  
- Use this to tell students we will be learning about a new statistic next time that helps us solve this problem... called PRE! 
- Also note that in the previous chapter we pointed out that SS is affected by sample size, which makes it hard to compare model fit across samples of different sizes. But in the current example, we are fitting two different models (empty and condition) to the same exact dataset, which makes the comparison of SS still useful.

</div>


### 4.11 Visualizing SS Total vs. SS Error

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/7.5-7.7-overview-venn2.jpg" width = 400 align="right">Let’s bring it all together with a visual.

Think of **SS Total** as representing all the variation in our outcome variable: a full circle (the shaded circle in the figure).

When we add an explanatory variable to our model, that variable might help explain some of the variation. So now, the variation is split:

- Some variation is *explained by the model* (or you could say *reduced by the model*)
- Some variation is *left over as error* (**SS Error**)


<div class="guided-notes">

### 4.12 Where in the diagram do you see SST? Where do you see SSE?

</div>


## 5 Interpreting distributions of residuals

When we have made histograms and jitter plots so far we have been plotting the distribution of an outcome variable. But sometimes statisticians like to plot the residuals that are left over after fitting a model. Let's take a look at what the residuals from the condition model might look like when viewed as a distribution.

### 5.1 Let's first add a new variable to `anchor_data` called `condition_resid`. Then look at the data frame (e.g., using `head()`) to make sure the residuals have been saved.

In [None]:
# code here

# COMPLETE
anchor_data$condition_resid <- resid(condition_model)
head(anchor_data)

### 5.2 Below we have two jitter plots: one of the outcome (`guess`) by `condition` (on the left), and the other of residuals from the condition model (`condition_resid`) by `condition` (on the right). 

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/7.5-7.8-overview-outcomes-vs-residuals.png">

<div class="discussion-question">

### 5.3 What is similar between the two graphs? What is different?
    
- Why are the means of both groups on the residual graph the same? (We know there is a difference in `guess` across conditions.)
- Why are the means of both groups on the residual graph equal to 0?
- If we fit a condition model to each of these two outcome variables, which do you think would have a higher SST? Explain your thinking.

</div>


<div class="teacher-note">

**Sample Responses:**  
    
- Similarities:
    - The pattern of spread is similar in both graphs: the ascending group has less spread, and the descending group has more.
- Differences:
    - The group means differ in the outcome graph (ascending mean smaller than descending mean) but are the same in the residual graph (both 0).
    - The scales are different: the outcome graph uses students’ original guesses (from 0 to 10,000), while the residuals graph includes both positive and negative values (from -2500 to 7000), because residuals show how far each data point is above or below its model prediction.

- The means of the residuals are 0 because the residuals within each group sum to zero. The residuals are perfectly balanced around their group means.

- When discussing the SST question:
    - Some students may predict that the residuals will have a smaller SST because the values are numerically smaller.
    - Others may think SST stays the same, recalling that SST depends on the outcome variable’s overall variation.
    - Either way is fine; this question is meant to capture their predictions before they use R to calculate it in the next question.


</div>


### 5.4 Run the code below to fit the `resid_model` and then run `supernova()` on both the `resid_model` and the `condition_model`.

In [None]:
resid_model <- lm(condition_resid ~ condition, data = anchor_data)
supernova(condition_model)
supernova(resid_model)

<div class="discussion-question">

### 5.5 Discussion Question: Do these numbers make sense?
    
- Why is SST for `resid_model` the same as SSE for the `condition_model`?

</div>


<div class="teacher-note">

**Sample Responses:**
- The residuals represent what is left after fitting the condition model on guesses. So the SST for the residuals is the same as the SSE for the guesses.
- Because we already explained the group differences when we made the residuals (e.g., took out SSM), the SST for the residuals is the same as the SSE for the original guesses.

**Teacher Note:**
Students should recognize that the residuals represent the leftover error (SSE) after fitting the condition model. Once a model explains part of the variation (SSM), what remains, the residuals, becomes the new total variation (SST) if we start over.

Students may wonder why SSM is 0 in the condition model of the residuals. The reason is that the residuals already represent data with no group difference left (that's why the two group means are the same). So when we fit a condition model to the residuals, there’s no additional variation for condition to explain, meaning SSM = 0 and SST = 0 + SSE.
    
</div>