# Adding an Explanatory Variable to a Model
## Chapter 7.1-7.4 In-Class Guide

In [None]:
# run this to set up the notebook
suppressMessages(library(coursekata))
suppressMessages(library(gridExtra))
# get the data
anchor_data <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vR8CPolRTYxJ6eCszgpgcGvuQ6tyFNoraBbFrFXEbKFgZgouLAlrgEC6wyRqWwSZPTmS2Xpd09P6G9y/pub?gid=1344310652&single=true&output=csv") 

In [1]:
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

## 1. Data Collection Activity

<br>
<div style="margin: 0 auto; font-family: Arial, sans-serif; border: 2px solid black; width: 380px; height: 220px; background-color: #F2F2F2; padding: 10px; box-sizing: border-box; font-size: 12pt; line-height: 1.4; font-weight: bold; text-align: left;">
  On the other side of this card is a math problem. When you are told to do so, turn over the card and look at it for 5 seconds. Then turn the card back over and write your estimate of the answer in the space below.<br><br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;___________________________
</div>

### 1.1 Everyone was given the same numbers to multiply. Do you expect all their guesses to be the same? Why or why not?

## 2. Explore Variation in `guess`
We have collected some data from you. We have a similar data set collected from a few other classes of students; we will start by analyzing that.

### 2.1 Run the code below to create a jitter plot to visualize variation in guesses

In [None]:
# run this code; no need to edit
gf_jitter(guess ~ "x", data=anchor_data, width=.1 )

### 2.2 Look back at the code. What is the outcome variable? What is the data frame?

### 2.3 If we randomly chose one more student and added them to the data set, what would you predict their guess would be?

## 3. Review the Empty Model
### 3.1 Fit the empty model of `guess`; save it as `empty_model`; print out the model

In [None]:
# fit and save the empty_model of guess


### 3.2 What does the 1946 represent?

<div class="guided-notes">
    <h3>3.3 Draw in the empty model prediction on the graph</h3>
</div>

<div class="guided-notes">
    <h3>3.4 Write the code to overlay the <code>empty_model</code> on the jitter plot?</h3>
</div>

In [None]:
# add the empty_model to the plot
gf_jitter(guess ~ "x",  width = .1, data = anchor_data) 


### 3.5 Does this look like your drawing of the empty model prediction on the jitter plot? 

## 4. Adding an explanatory variable to the model

The empty model isn't very good - there is lots of error around the model prediction. We might be able to make a better model (i.e., one with less error) if we had an explanatory variable. 

It turns out we do have a possible explanatory variable in our data that might help us make a better model. Run the code below to check out the variable called `condition`. Half the students in our class were randomly presented with the problem 1x2x3x4x5x6x7x8 (the `ascending` condition). The other half got 8x7x6x5x4x3x2x1 (the `descending` condition). 

In [None]:
# run this code
head(anchor_data)

<div class="discussion-question">
<b>Key Discussion Question:</b> Do you think condition would have an impact on guess? Why or why not?
</div>

<div class="guided-notes">
    <h3>4.1 Write two word equations: one to represent the hypothesis that condition would cause variation in guess; the other to represent the idea that it would <i>not</i> have an impact.</h3> Where would each word equation go on your guided notes? Which represents the empty model, which the condition model?
</div>

### 4.2 Make a jitter plot to explore the effect of `condition` on `guess`.

In [None]:
# make a jitter plot


<div class="discussion-question">
<b>Key Discussion Question:</b> Does condition explain some of the variation in guess? (If you know what condition a student is in, could you make a better prediction of what their guess would be? What do you think the model predictions would be for the condition model?
</div>

<div class="guided-notes">
    <h3>4.3 Draw in the condition model predictions on the graph</h3> Draw your prediction lightly; you may want to revise it later.
</div>

<div class="guided-notes">
    <h3>4.4 Fit the condition model using R</h3>
On your guided notes, find the code we used to fit the empty model (<b>guess = mean + other stuff</b>). Try to figure out the code to fit and save the condition model (<b>guess = condition + other stuff</b>).
</div>

In [None]:
# write code to fit (and save) the condition model
condition_model <- 

<div class="guided-notes">
    <h3>4.5 Add some code to overlay the condition_model on the jitter plot</h3>
</div>

In [None]:
# add the model
gf_jitter(guess ~ condition, data = anchor_data, width = .1) 


### 4.6 What is the model prediction for students in the ascending group? In the descending group? Are they close to what you expected?

<div class="guided-notes">
    <h3>4.7 Use the predict() function to generate predictions for the condition model; compare to predictions for the empty model.</h3>
</div>

In [None]:
# code here


<div class="discussion-question">
<b>Key Discussion Question:</b> What do you notice about the predictions of the empty model compared to those of the condition model? What is similar? What is different?
</div>

### 4.8 Use the condition model to generate predictions for each person in the data set, and save their predictions in a new variable (called `condition_predict`).

In [None]:
# save the condition model predictions in a new variable called condition_predict

# print out the top 6 rows of the updated data frame to verify what you did



### 4.9 Does the new variable seem to make sense? Is it what you expected? What do you notice about it?

teacher note: students should see that both models make predictions, but condition makes two

## 5. Understanding the parameter estimates
We have referred to the empty model as a one-parameter model. The reason is that we use the data to estimate one parameter, $b_0$, which is the mean of the distribution of the outcome variable. 

As we have seen, the `condition_model` generates two different predictions, one for each group. To do this it will need to use two parameters ($b_0$ and $b_1$). Let's see how they work.

### 5.1 Print out the model estimates for the condition model
Which of these estimates do you think is $b_0$? Which is $b_1$?

In [None]:
# print out parameter estimates


<div class="guided-notes">
    <h3>5.2 Label the parameter estimates in the output with the appropriate GLM notation</h3> Label the parameter estimates for the condition model as either $b_0$ or $b_1$.</div>

<div class="discussion-question">
<b>Key Discussion Question:</b> Are these the estimates you expected? Are they the same as the two model predictions? 
</div>

<div class="guided-notes">
    <h3>5.3 Label $b_0$ and $b_1$ in the graph </h3>
See if you can figure out what each parameter estimate represents.</div>

In [None]:
# just run this code and look at the graph
b0 <- b0(condition_model)
b1 <- b1(condition_model)

gf_jitter(guess ~ condition, data = anchor_data, width = .1) %>%
  gf_model(condition_model, color = "black") +
  # drawing a line segment
  annotate("segment", x = 1.5, y = b0, xend = 1.5, yend = (b0+b1), color = "black",
    arrow = arrow(ends = "both", angle = 90, length = unit(.2,"cm")))  + 
  # putting guidelines onto plot
  scale_y_continuous(breaks=seq(0,10000,1000)) 

<div class="guided-notes">
<h3>5.4 Write a set of instructions (just in words) for how  to use the parameter estimates ($b_0$ and $b_1$) to generate the  predictions of the condition model</h3>
Remember, your instructions must generate different predictions for someone in the ascending condition and someone in the descending condition.
</div>

$b_0$... $b_1$...



## 6. Writing the Group Model with GLM Notation
We've learned how to write the empty model in GLM notation: 
$$Y_i=b_0+e_i$$

Following the **DATA = MODEL + ERROR** idea, the model part is $b_0$, which is the prediction we give for every observation in the empty model. If we just want to represent the function that generates the  prediction, we can use this notation:

$$\text{MODEL prediction}=b_0$$

How do we write a model that generates a different prediction depending on condition? To do this we need a variable, which we will call $X_i$. Here's the GLM notation for the group model:

$$\text{MODEL prediction}=b_0+b_1X_i$$

<div class="guided-notes">
    <h3>6.1 Write two versions of the condition model in GLM notation, one filling in the actual parameter estimates</div>

<div class="guided-notes">
    <h3>6.2: What would this model predict using the parameter estimates ($b_0$ and $b_1$) if $X_i=0$? What about if $X_i=1$? </div>

<div class="discussion-question">
<b>Key Discussion Questions:</b> The $b_0$ and $b_1$ are parameter estimates; the $X_i$ represents the explanatory variable. (You can tell it is a variable because it has a sub-i.) What does it mean about a person if their value for $X_i=0$? What about if $X_i=1$
</div>

In [None]:
# run this cell
condition_model

<div class="discussion-question">
<b>Key Discussion Questions:</b> 
    
Look again at the parameter estimates for the condition model. 
- Where do you see $b_0$? $b_1$? Where do you see $X_i$? 
- Why do you think one of the parameter estimates is labeled <b>conditiondescending</b>?
</div>

<div class="guided-notes">
    <h3>6.3: Put it all together! Label the plot on the guided notes. </div>

In [None]:
# just run this code and look at the graph
b0 <- b0(condition_model)
b1 <- b1(condition_model)

gf_jitter(guess ~ condition, data = anchor_data, width = .1) %>%
  gf_model(condition_model, color = "black") +
  # drawing a line segment
  annotate("segment", x = 1.5, y = b0, xend = 1.5, yend = (b0+b1), color = "black",
    arrow = arrow(ends = "both", angle = 90, length = unit(.2,"cm")))  + 
  # putting guidelines onto plot
  scale_y_continuous(breaks=seq(0,10000,1000)) 

<div class="guided-notes">
    <h3>6.4: Draw the empty model on the same graph. </h3>
</div>

<div class="discussion-question">
<b>Key Discussion Questions:</b> In what sense are the empty model and the condition model both models? Connect to <b>DATA = MODEL + ERROR</b>.
</div>