# What Happens If Descending Comes First? (COMPLETE)
## Chapter 7.1-7.4 Focus

In [None]:
# run this to set up the notebook
suppressMessages(library(coursekata))
suppressMessages(library(gridExtra))
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))
# get the data
anchor_data <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vR8CPolRTYxJ6eCszgpgcGvuQ6tyFNoraBbFrFXEbKFgZgouLAlrgEC6wyRqWwSZPTmS2Xpd09P6G9y/pub?gid=1344310652&single=true&output=csv") 

<div class="teacher-note">
    <b>Teacher Note:</b> The purpose of this focus activity is to deepen students' understanding of how parameters are estimated and used for the two-group model. It is intended to be used after the 7.1-7.4 Overview notebook. Students should work in Jupyter, either individually or in pairs.
</div>

## 1. The Mental Multiplication Study

You may have participated in this study: Half of a class of students, randomly chosen, were presented with a multiplication problem in *ascending* order (1x2x3x4x5x6x7x8), the other half, in *descending* order (8x7x6x5x4x3x2x1). Students were given 5 seconds to view the problem and then asked to write down their best guess as to the answer. The answer, of course, is the same regardless of order. However, students' guesses differed markedly between the two groups.

Data are stored in a data frame called `anchor_data`. The dataset has has 72 students and two variables:

- `condition` - a categorical variable coded as "ascending" or "descending"
- `guess` - a numerica variable with each student's guess

In [None]:
# run some code to make sure you have the data
str(anchor_data)

<div class="teacher-note">
    <b>Teacher Note:</b> The purpose of the next section is to review the two-group model. Students will visualize the model, fit the model, interpret the parameter estimates, then use the parameter estimates to make predictions.
</div>

## 2. Review the `condition` model of `guess`

### 2.1 Visualize the model:  `guess` = `condition` + other stuff
Use a jitter plot to visualize the effect of condition on guess. Try different values for `width =` to make the difference between the two conditions more visible.

In [None]:
# code here
gf_jitter(guess ~ condition, data=anchor_data, width=.2)

### 2.2 Fit the `condition` model
Fit the condition model of guess. Save the model as `condition_model`. Print out the parameter estimates, and overlay the model on the jitter plot.

In [None]:
# fit and save the model
condition_model <- lm(guess ~ condition, data=anchor_data)
# print out the parameter estimates
condition_model
# overlay the model on the jitter plot
gf_jitter(guess ~ condition, data=anchor_data, width=.2) %>%
gf_model(condition_model)

### 2.3 How do the parameter estimates generate the model predictions?
The best-fitting model can be written like this (first without the estimates, and then with the estimates filled in):

(1) $b_0+b_1X_i$ <br>
(2) $786.1+2257.9*X_i$

Where in the graph do you see $b_0$? Where do you see $b_1$? Write your answers in the space below.

b0 is the mean of the ascending group. b1 is the difference between the ascending and descending group, i.e., the adjustment you must make from the prediction for ascending to get the prediction for descending.

Why is the $b_1$ parameter estimate (2257.9) labeled as "conditiondescending" in the model output? Write your answer in the space below.

It's the adjustment you must make from the reference group (b0) to get to the descending group (i.e., conditiondescending)

## 3. Predict: What would happen if you switched the order?
When R fits the condition model it arbitrarily assigns $X_i$ to be 0 for students in the ascending condition and 1 for students in the descending condition. This makes the ascending group the <i>reference group</i>. $b_0$ is the model prediction for the reference group; $b_1$ is the amount that must be added to the reference group prediction to get the other group's (i.e., the descending group) prediction.

### 3.1 Do you think the *parameter estimates* would change if the descending group were made the reference group? Why or why not?

Yes, they would change. The b0 estimate would now be the mean of the descending group.

### 3.2 Do you think the *model predictions* would change if the descending group were made the reference group? Why or why not?

No, the model predictions would be the same - still the means of the two groups.

## 4. Test your predictions: Let's get R to switch the order of the groups
By default, R puts the category labels in alphabetical order and makes the one that comes first the reference group. Thus, ascending comes before descending. One way to get R to make descending the reference group would be to change the category labels so that ascending comes second in alphabetical order. For example, we could change the label from `ascending` to `xascending`. (By putting the "x" before ascending, we made it come later in the alphabet. We also could have used any other letter that comes after "d".) We've done that in the code cell below, saving the results in a new variable called `xcondition`.

In [None]:
# run this cell
# create new variable xcondition
anchor_data$xcondition <- recode(anchor_data$condition, "ascending" = "xascending")
# checks to see if the new variable looks right
str(anchor_data)

### 4.1 Refit the two-group model, but use `xcondition` as the explanatory variable

In [None]:
# create xcondition_model
xcondition_model <- lm(guess ~ xcondition, data=anchor_data)
# print out the model estimates
xcondition_model

### 4.2 Interpret the new parameter estimates
What does the 3043 mean? What about the -2257?

3043 is the mean of the descending group. -2257 is the amount that must be added to get the mean of the descending group. (Because it is negative, it will yield a lower mean, which is as expected.)

### 4.3 Write the `xcondition_model` in GLM notation
Replace the $b_0$ and $b_1$ with the actual parameter estimates

$Y_i=3043 + -2257*X_i$

### 4.4 Do the model predictions change?
The `xcondition_model` is still a two-group model, which means it makes two different model predictions. Do these predictions change because we switched the order of the groups? Why or why not?

No, they don't.

<div class="teacher-note">
    <b>Teacher Note:</b> It's important to note that while the parameter estimates change based on the change in reference group, the actual predictions do not change. They still are the means of the two groups, and the means of the groups do not change.
</div>