# Adding an Explanatory Variable to a Model (COMPLETE)
## Chapter 7.1-7.4 Overview Notebook

In [None]:
# run this to set up the notebook
suppressMessages(library(coursekata))
suppressMessages(library(gridExtra))

# get the data
anchor_data <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vR8CPolRTYxJ6eCszgpgcGvuQ6tyFNoraBbFrFXEbKFgZgouLAlrgEC6wyRqWwSZPTmS2Xpd09P6G9y/pub?gid=1344310652&single=true&output=csv") 

In [None]:
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

<div class="teacher-note">
    <b>Teacher Note:</b> This lesson has two main goals: 
    <ol>
        <li>
            Expand the concept of model from the empty model to one that includes an explanatory variable 
            (a two-group categorical explanatory variable). Students should understand that both the empty 
            model and the two-group model are models in that they both:
            <ul>
                <li>are functions that make a prediction for each observation (though while the empty model makes only one unique prediction, the two-group model makes two)</li>
                <li>can be fit (parameters estimated) using the <code>lm()</code> function in R</li>
                <li>can be used to generate predictions using the <code>predict()</code> function in R</li>
                <li>can be represented in GLM notation</li>
            </ul>
        </li>
        <li>
            Connect how models make predictions to their parameter estimates ($b_0$ and $b_1$) and variables ($X_i$). Students should be able to:
            <ul>
                <li>Connect predictions (in this case the means of each group) to parameter estimates (and vice versa)</li>
                <li>Connect predictions to variables (and vice versa)</li>
                <li>Understand the difference between parameter estimates and variables</li>
            </ul>
        </li>
    </ol>
    A <a href="https://docs.google.com/document/d/1zi5rDI0BsjZCd97j1OeTDZCW_XdjTiQ5IKUgpEwDPRw/edit?usp=sharing" target="_blank">printable student guided-notes worksheet</a> is available to go with this Jupyter notebook, as well as a student version of this notebook.
</div>

## 1. Data Collection Activity

<br>
<div style="margin: 0 auto; font-family: Arial, sans-serif; border: 2px solid black; width: 380px; height: 220px; background-color: #F2F2F2; padding: 10px; box-sizing: border-box; font-size: 12pt; line-height: 1.4; font-weight: bold; text-align: left;">
  On the other side of this card is a math problem. When you are told to do so, turn over the card and look at it for 5 seconds. Then turn the card back over and write your estimate of the answer in the space below.<br><br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;___________________________
</div>

<div class="teacher-note">
    <b>Teacher Note:</b> Hand out a card to each student. Tell them to keep the instructions side up, and the side with the math problem down. Half the students will get a multiplication problem in <b>ascending</b> order, the other half, in <b>descending</b> order. (The answer, therefore, is the same in both cases.) When students are done, collect all the cards.
    
Download a printable sheet for making the <a href="https://docs.google.com/document/d/17QcRs2dXqrpDWrcDmO6OTZkIDoJ5ktgRpWCKjb1rNXU/edit?usp=sharing" target="_blank">data collection cards</a>.
</div>

### 1.1 Everyone was given the same numbers to multiply. Do you expect all their guesses to be the same? Why or why not?

<div class="teacher-note">    
    <b>Sample Responses:</b> 
    
- The correct answer to the multiplication problem should be the same for everyone. 
- A lot of people won't guess the correct answer so there would be a lot of variation in the guesses. Some people are better or faster at mental arithmetic than others; some may take the task more seriously; some people might be better at estimation; etc.

</div>

## 2. Explore Variation in `guess`
We have collected some data from you. We have a similar data set collected from a few other classes of students; we will start by analyzing that.

### 2.1 Run the code below to create a jitter plot to visualize variation in guesses

In [None]:
# run this code; no need to edit
gf_jitter(guess ~ "x", data=anchor_data, width=.1 )

### 2.2 Look back at the code. What is the outcome variable? What is the data frame?

<div class="teacher-note">
    
<b>Sample Responses:</b>         

- outcome variable - guess
- data frame - anchor_data

<b>Teacher Note:</b>
- Students may want to discuss things they notice about the data. For example, someone usually brings up the outlier (~10,000). You may want to hint that this person is the closest to the right answer. But don't tell them the correct answer to the multiplication problem yet! They don't yet know that there were two different orders yet.
    
</div>

### 2.3 If we randomly chose one more student and added them to the data set, what would you predict their guess would be?

<div class="teacher-note">
    
<b>Sample Response:</b> Students might say there is no way to tell; some might say the mean.
    
<b>Teacher Note:</b> We want students to remember from the previous chapters that without other information, the mean is going to be the best prediction. It is very unlikely to be correct, but it is just as likely to be too high as to be too low.
</div>

## 3. Review the Empty Model
### 3.1 Fit the empty model of `guess`; save it as `empty_model`; print out the model

In [None]:
# sample response

# fit and save the empty_model of guess
empty_model <- lm(guess ~ NULL, data = anchor_data)
empty_model

### 3.2 What does the 1946 represent?

<div class="teacher-note">
<b>Sample Responses:</b> 

- The number 1946 is the mean of the distribution of guess;
- It also is the best-fitting parameter estimate $b_0$ for the empty model.
    
</div>

<div class="guided-notes">
    <h3>3.3 Draw in the empty model prediction on the graph</h3>
</div>

<div class="guided-notes">
    <h3>3.4 Write the code to overlay the <code>empty_model</code> on the jitter plot?</h3>
</div>

In [None]:
# sample response

# add the empty_model to the plot
gf_jitter(guess ~ "x",  width = .1, data = anchor_data) %>%
  gf_model(empty_model)

### 3.5 Does this look like your drawing of the empty model prediction on the jitter plot? 

<div class="teacher-note">
<b>Teacher Note:</b> 

- Even if they intended to draw in the mean, some students may have under- or overestimated the mean. 
- Overestimation: In skewed distributions, sometimes students simply cut the range in half. You may want to discuss why the actual mean is lower than just the middle of the range estimated from the graph (0 to 10,000). Sometimes students don't consider that the many small guesses tend to pull the mean down.
    
</div>

## 4. Adding an explanatory variable to the model

The empty model isn't very good - there is lots of error around the model prediction. We might be able to make a better model (i.e., one with less error) if we had an explanatory variable. 

It turns out we do have a possible explanatory variable in our data that might help us make a better model. Run the code below to check out the variable called `condition`. Half the students in our class were randomly presented with the problem 1x2x3x4x5x6x7x8 (the `ascending` condition). The other half got 8x7x6x5x4x3x2x1 (the `descending` condition). 

In [None]:
# run this code
head(anchor_data)

<div class="discussion-question">
<b>Key Discussion Question:</b> Do you think condition would have an impact on guess? Why or why not?
</div>

<div class="teacher-note">
<b>Teacher Note:</b> 

- This is an important moment to get them to think about the Data Generating Process (or the Guess Generating Process). 
- Nobel Prize-winning Daniel Kahneman and Amos Tversky conducted this experiment with high school students in 1974! They hypothesized that humans perform the first few multiplications and then adjust their estimates from there. The initial numbers act as an "anchor," influencing their guesses. When smaller numbers appear first, estimates are anchored at smaller products; when larger numbers come first, estimates are anchored at larger products. This anchoring effect is why we named the dataset "anchor_data."
- The real answer is more than 40,000 but typically students don't guess anywhere near that high. 
    
</div>

In [None]:
# sample response

# R code to calculate 8!
factorial(8)

<div class="guided-notes">
    <h3>4.1 Write two word equations: one to represent the hypothesis that condition would cause variation in guess; the other to represent the idea that it would <i>not</i> have an impact.</h3> Where would each word equation go on your guided notes? Which represents the empty model, which the condition model?
</div>

<div class="teacher-note">
    
<b>Sample Responses:</b>
    
- guess = condition + other stuff
- guess = mean + other stuff
    
<b>Teacher Note:</b> We want students to understand that the empty model word equation represents the idea that any explanatory variable that might be available has no effect.
</div>

### 4.2 Make a jitter plot to explore the effect of `condition` on `guess`.

In [None]:
# sample response

# make a jitter plot
gf_jitter(guess ~ condition, width = .2, data = anchor_data)

<div class="discussion-question">
<b>Key Discussion Question:</b> Does condition explain some of the variation in guess? (If you know what condition a student is in, could you make a better prediction of what their guess would be? What do you think the model predictions would be for the condition model?
</div>

<div class="teacher-note">
<b>Sample responses:</b> 
    
- They would probably predict a higher number for a guess from someone in the descending condition and a lower one for someone in ascending condition.
- Have students make specific numerical predictions. They may suggest numbers such as 500 or 750 for the "ascending" group and 2500 or 3000 for the "descending" group.
- Some students may suggest calculating the means. If so, you can teach them `favstats(guess ~ condition, data = anchor_data)` in advance.
    
</div>

<div class="guided-notes">
    <h3>4.3 Draw in the condition model predictions on the graph</h3> Draw your prediction lightly; you may want to revise it later.
</div>

<div class="teacher-note">
<b>Teacher Note:</b> We want students to notice that if they know what condition someone is in, they would make a different prediction about their guess. Also: Because there are only two levels of <b>condition</b>, there should be only two different predictions.
</div>

<div class="guided-notes">
    <h3>4.4 Fit the condition model using R</h3>
On your guided notes, find the code we used to fit the empty model (<b>guess = mean + other stuff</b>). Try to figure out the code to fit and save the condition model (<b>guess = condition + other stuff</b>).
</div>

In [None]:
# sample response

# write code to fit (and save) the condition model
condition_model <- lm(guess ~ condition, data = anchor_data)

<div class="guided-notes">
    <h3>4.5 Add some code to overlay the condition_model on the jitter plot</h3>
</div>

In [None]:
# sample response

# add the model
gf_jitter(guess ~ condition, data = anchor_data, width = .1) %>%
gf_model(condition_model)


### 4.6 What is the model prediction for students in the ascending group? In the descending group? Are they close to what you expected?

<div class="teacher-note">
<b>Teacher Note:</b> Have students correct their drawings if necessary. We want students to notice that this model makes two different predictions, whereas the empty model made only one. We also want them to hypothesize that the predictions are the means of the two groups. You can use favstats() in the code cell below to verify whether the predictions are the group means.
</div>

<div class="guided-notes">
    <h3>4.7 Use the predict() function to generate predictions for the condition model; compare to predictions for the empty model.</h3>
</div>

In [None]:
# sample response

# code here
predict(empty_model)
predict(condition_model)

<div class="discussion-question">
<b>Key Discussion Question:</b> What do you notice about the predictions of the empty model compared to those of the condition model? What is similar? What is different?
</div>

<div class="teacher-note">
    
<b>Sample Responses:</b>

- The empty model makes the same guess (1945, the mean).
- The condition model makes two different guesses (786 and 3043).
- Students may wonder why there are so many predictions. There are the same number of predictions from each model as there are students (72). Why? The following questions explore more about what the predict function does. (They might remember from before that the predict function makes a prediction for each row in the data frame.)
    
</div>

### 4.8 Use the condition model to generate predictions for each person in the data set, and save their predictions in a new variable (called `condition_predict`).

In [None]:
# sample response

# save the condition model predictions in a new variable called condition_predict
anchor_data$condition_predict <- predict(condition_model)
# print out the top 6 rows of the updated data frame to verify what you did
head(anchor_data)

### 4.9 Does the new variable seem to make sense? Is it what you expected? What do you notice about it?

<div class="teacher-note">
    
<b>Sample Responses:</b>

- This is what the `predict()` function does. 
- Ask students: How does the predict function know when to predict 786 and when to predict 3042? (Answer: By looking at which condition they were in. Remember, this is the **guess = condition + other stuff** model.)
    
</div>

## 5. Understanding the parameter estimates
We have referred to the empty model as a one-parameter model. The reason is that we use the data to estimate one parameter, $b_0$, which is the mean of the distribution of the outcome variable. 

As we have seen, the `condition_model` generates two different predictions, one for each group. To do this it will need to use two parameters ($b_0$ and $b_1$). Let's see how they work.

### 5.1 Print out the model estimates for the condition model
Which of these estimates do you think is $b_0$? Which is $b_1$?

In [None]:
# sample response

# print out parameter estimates for the condition model
condition_model

<div class="guided-notes">
    <h3>5.2 Label the parameter estimates in the output with the appropriate GLM notation</h3> Label the parameter estimates for the condition model as either $b_0$ or $b_1$.</div>

<div class="discussion-question">
<b>Key Discussion Question:</b> Are these the estimates you expected? Are they the same as the two model predictions? 
</div>

<div class="teacher-note">

<b>Sample Responses:</b>
    
- The 786 is the prediction for the ascending condition. Also the mean of that condition.
- Some students will (incorrectly) say that 2256 is the mean of the descending condition. But point out the mean of the descending condition (and the model's prediction for the descending condition) was 3043.
    
<b>Teacher Note:</b> Make sure students notice that 3043 is <i>not</i> one of the model predictions.
</div>

<div class="guided-notes">
    <h3>5.3 Label $b_0$ and $b_1$ in the graph </h3>
See if you can figure out what each parameter estimate represents.</div>

In [None]:
# just run this code and look at the graph
b0 <- b0(condition_model)
b1 <- b1(condition_model)

gf_jitter(guess ~ condition, data = anchor_data, width = .1) %>%
  gf_model(condition_model, color = "black") +
  # drawing a line segment
  annotate("segment", x = 1.5, y = b0, xend = 1.5, yend = (b0+b1), color = "black",
    arrow = arrow(ends = "both", angle = 90, length = unit(.2,"cm")))  + 
  # putting guidelines onto plot
  scale_y_continuous(breaks=seq(0,10000,1000)) 

<div class="guided-notes">
<h3>5.4 Write a set of instructions (just in words) for how  to use the parameter estimates ($b_0$ and $b_1$) to generate the  predictions of the condition model</h3>
Remember, your instructions must generate different predictions for someone in the ascending condition and someone in the descending condition.
</div>

<div class="teacher-note">
    
<b>Teacher Note:</b> 

- You might ask students to discuss with a partner, then share with the class. In the end, you want to converge on one idea that you can write here and students can write in the guided notes. 
- We suggest something like: "If someone is in the ascending condition predict $b_0$. If someone is in the descending condition, predict $b_0+b_1$."
</div>

## 6. Writing the Group Model with GLM Notation
We've learned how to write the empty model in GLM notation: 
$$Y_i=b_0+e_i$$

Following the **DATA = MODEL + ERROR** idea, the model part is $b_0$, which is the prediction we give for every observation in the empty model. If we just want to represent the function that generates the  prediction, we can use this notation:

$$\text{MODEL prediction}=b_0$$

How do we write a model that generates a different prediction depending on condition? To do this we need a variable, which we will call $X_i$. Here's the GLM notation for the group model:

$$\text{MODEL prediction}=b_0+b_1X_i$$

<div class="guided-notes">
    <h3>6.1 Write two versions of the condition model in GLM notation, one filling in the actual parameter estimates</div>

<div class="teacher-note">
<b>Sample Responses:</b> 
    
- $b_0+b_1X_i$ 

- $786.1+2256.9X_i$
</div>

<div class="guided-notes">
    <h3>6.2: What would this model predict using the parameter estimates ($b_0$ and $b_1$) if $X_i=0$? What about if $X_i=1$? </div>

<div class="teacher-note">
<b>Sample Responses:</b> 

- $786.1+2256.9(0) = 786.1$
- $786.1+2256.9(1) = 3043$
</div>

<div class="discussion-question">
<b>Key Discussion Questions:</b> The $b_0$ and $b_1$ are parameter estimates; the $X_i$ represents the explanatory variable. (You can tell it is a variable because it has a sub-i.) What does it mean about a person if their value for $X_i=0$? What about if $X_i=1$
</div>

<div class="teacher-note">
<b>Sample Responses:</b> 

- When X was 0, it led to the ascending group prediction (786). So we can assume X=0 means ascending.
- When X was 1, it led to the descending group prediction (3043). So we can assume X=1 means descending.
     
</div>

In [None]:
# run this cell
condition_model

<div class="discussion-question">
<b>Key Discussion Questions:</b> 
    
Look again at the parameter estimates for the condition model. 
- Where do you see $b_0$? $b_1$? Where do you see $X_i$? 
- Why do you think one of the parameter estimates is labeled <b>conditiondescending</b>?
</div>

<div class="teacher-note">
<b>Sample Responses:</b> 

- Notice that `conditiondescending` is what it says over $b_1$. Another way to think about $X_i$ is $\text{conditiondescending}_i$. When conditiondescending = 1, it means "descending". When conditiondescending = 0, it means "not descending" (or ascending).
     
</div>

<div class="guided-notes">
    <h3>6.3: Put it all together! Label the plot on the guided notes. </div>

In [None]:
# just run this code and look at the graph
b0 <- b0(condition_model)
b1 <- b1(condition_model)

gf_jitter(guess ~ condition, data = anchor_data, width = .1) %>%
  gf_model(condition_model, color = "black") +
  # drawing a line segment
  annotate("segment", x = 1.5, y = b0, xend = 1.5, yend = (b0+b1), color = "black",
    arrow = arrow(ends = "both", angle = 90, length = unit(.2,"cm")))  + 
  # putting guidelines onto plot
  scale_y_continuous(breaks=seq(0,10000,1000)) 

<div class="guided-notes">
    <h3>6.4: Draw the empty model on the same graph. </h3>
</div>

<div class="discussion-question">
<b>Key Discussion Questions:</b> In what sense are the empty model and the condition model both models? Connect to <b>DATA = MODEL + ERROR</b>.
</div>

<div class="teacher-note">
<b>Sample Responses:</b> 

- are both functions that make a prediction for each observation (though while the empty model makes only one unique prediction, the two-group model makes two) 
- both models make use of the mean (e.g., the "grand" mean is used by the empty model; the condition model uses the means of each group)
- can be fit (parameters estimated) using the lm() function in R
- can be depicted onto a plot using gf_model()
- can be used to generate predictions using the predict() function in R
- can be represented in GLM notation
   
</div>