# 11C: Anxiety in the ER

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

# This code will make sure the middle rows/columns don't get cut out (ellipsized) when you 
# print out a really large data frame (you can adjust the values for max rows/cols)
options(repr.matrix.max.rows=800, repr.matrix.max.cols=200)

#save modified version of the data with only the variables we need 
er_anxiety <- select(er, condition, age, gender, race, base_anxiety, later_anxiety, last_anxiety)

## 1.0: The Data

Let's look at the data (in a data frame called `er_anxiety`). Then I'll tell you a bit more about how it was collected.

In [None]:
head(er_anxiety)

**1.1:** What do you think these cases (rows) are?

#### About the Study

<img src="https://i.postimg.cc/qpYsC1xY/image.png" alt="a variety of therapy dogs" width = 60%>

Researchers were interested in the potential benefits of therapy dogs in easing things such as anxiety, pain, and depression during emergency room visits. Several medically stable, adult patients visiting an emergency room were approached and randomly assigned to one of two conditions: 15 minutes exposure to a certified therapy dog and handler (**Dog condition**), or usual care (**Control condition**). Patient-reported anxiety, pain, and depression were assessed using a 0–10 scale (10 = worst), at three time points: 

- baseline (before the therapy dog)
- later (30 minutes after the therapy dog or control treatment)
- last (90 minutes after)

#### Study Procedure

<img src="https://i.postimg.cc/syjV5VSK/image.png" alt="Diagram of Dog Therapy Study procedure" width=800>

## Motivating Question: Are therapy dogs helpful in the emergency room (ER)?

#### Key Variables

For today, we're going to focus on a few key variables having to do with patients' demographic characteristics and anxiety levels:

- `condition`: The research condition the patient was randomly assigned to (Dog or Control)
- `age`: The age of the patient	
- `gender`: the gender of the patient	
- `race`: The race of the patient 
- `base_anxiety`: The baseline self-reported anxiety rating on a scale of 0-10 (10 = worst), before any exposure to a therapy dog	 
- `later_anxiety`: Anxiety rating, 30 minutes after exposure to either the dog or the control treatment 
- `last_anxiety`: Anxiety rating, 90 minutes after exposure to treatment

##### Data Source: 

Research Paper: [Kline JA, Fisher MA, Pettit KL, Linville CT, Beck AM.](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0209232) Controlled clinical trial of canine therapy versus usual care to reduce patient anxiety in the emergency department. PLoS One. 2019 Jan 9;14(1):e0209232. doi: 10.1371/journal.pone.0209232. PMID: 30625184; PMCID: PMC6326463.



## 2.0: Explore Variation

**2.1:** Anxiety was measured at three different time points (`base_anxiety`, `later_anxiety`, and `last_anxiety`). For today, we will focus on the variable `later_anxiety` as our outcome variable. Make a visualization to explore the variation in `later_anxiety`.

**2.2:** The researchers were particularly interested in whether `condition` explains variation in anxiety. Make some visualizations to explore their hypothesis. Does `condition` make a difference on `later_anxiety`?

**Summary:** Just using our common sense, we can figure out some things about the DGP that generated this data. 

Because of common sense, we know that `condition` couldn't have caused any of the differences we see in these two groups in `base_anxiety` because that is before any dog therapy happened. Any difference we see in these two groups is likely due to random chance. 

However, because this data was collected in an experiment (with random assignment to these two conditions), condition *could have* caused the differences we see in the two groups. **But** it is important to keep in mind that randomness *could have* caused differences as well. Later we will learn how to rule out randomness as a DGP.

## 3.0: Modeling Variation in `later_anxiety`

**3.1:** Let's keep exploring the hypothesis that `condition` could explain variation in `later_anxiety`. Find the best fitting model and add it to this faceted histogram below. Also add it to the jitter plot below.


In [None]:
gf_histogram(~ later_anxiety, data = er_anxiety, binwidth = 1, fill = "orange") %>%
  gf_facet_grid(condition ~ .) 

In [None]:
gf_jitter(later_anxiety ~ condition, data = er_anxiety, width = .1, color = "darkorange3")

**3.2:** Write the best fitting model of **later_anxiety = condition + other stuff** in GLM notation. You can double click on this cell to copy the equation we have started for you below:

$Y_i = b_0 + b_1X_i + e_i$

*Notes on writing fancy mathematical notation:*
- You can write GLM using Ys and Xs: $Y_i = b_0 + b_1X_i + e_i$
- Or using the variable names: $lateranxiety_i = b_0 + b_1conditionDog_i + e_i$

**3.3:** Interpret the parameter estimates. How do these numbers relate to the model shown in the graph?



**3.4** What would the condition model predict as the `later_anxiety` for someone who got dog therapy? How about someone who didn't?

**3.5:** Why does the model predict lower anxiety for those in the dog condition?

## 4.0 - Simulating a Random DGP 

4.1 - What would the best fitting models typically look like if the DGP was random? What would the $b_1s$ usually look like? How much could they vary? Let's check it out.

Modify the code below. 

In [None]:
do(10) * b1(shuffle() ~ shuffle(condition), data = )

In [None]:
gf_jitter(shuffle() ~ condition, data = , color = "blue", size = 2) %>%
    gf_model(color = "navyblue") 

4.2 - What would the PREs look like from these models (from a random DGP)? Do you think they would be big? Small? Medium? Why? 

4.3 - Let's generate $PRE$s (like a 1000 of them) from a random DGP. How do those PREs generally vary? Try creating a visualization of your distribution of PREs.

(Bonus: What is this distribution of PREs called?)

In [None]:
#modify this code 
SDoPRE <- do(1000) * PRE(shuffle() ~ , data = )

head(SDoPRE)

4.4 - Where would your sample PRE exist on the distribution of PREs? Try adding it to your visualization.

In [None]:
# Add our sample's PRE to the visualization with gf_point()

#save the sample PRE
sample_PRE <- PRE( ~ , data = )

#add it to our visualization 
#note: using this fill option will color the lower .95 of values 
gf_histogram(~PRE, data = SDoPRE, bins = 50, fill = ~lower(PRE, .95)) %>%
    gf_point(0 ~ sample_PRE, color = "red")

4.5 - Consider the values in the ANOVA table for the main model you have been 
working with so far. Which value do you think corresponds the most to this statement:

> The probability of getting a PRE as large as the sample PRE, **if** there was no relationship between the variables in the DGP.

4.6 - Use tally() to see if "the proportion of PREs as large as the sample PRE, if there was no relationship between the variables in the DGP" from your sampling distribution really is similar to the number in the ANOVA table.

4.7 - So what do you think? Evaluate your model against the empty model. What does this mean for the researchers' hypothesis?

## 5.0 - Extending these Ideas to *F*

PRE and F are very closely related. They both try to show how much the complex model explains the outcome variable compared to the empty model. We should end up with the same conclusions whether we use PRE or F.

5.1 - To corroborate our intuitions, try creating a sampling distribution and histogram of the *F*  from shuffled data using `fval()` and `shuffle()`. Where does the sample F fall in this distribution?

Then use `tally()` to get the p-value from the simulated sampling distribution of PREs. 

## 6.0 - Conclusions 

6.1 - What conclusions can you draw regarding the researchers' hypothesis? Did being in the dog condition make a difference in later anxiety?

## 7.0 - Bonus: Working with a quantitative explanatory variable 

7.1 - The researchers also wondered whether a person's `base_anxiety` could help predict `later_anxiety`. Regardless of what condition someone was randomly assigned to, how well did their base anxiety predict with their later anxiety? 

Create this model and interpret the coefficients. 

7.2 - Try creating a sampling distribution and histogram of the PRE from shuffled data using PRE() and shuffle(). Where does the sample PRE from this new model fall in this distribution?

7.3 - Extend this to F. Try creating a sampling distribution and histogram of the F from shuffled data using fval() and shuffle(). Where does the sample F fall in this distribution?

7.4 - What do you think? Does `base_anxiety` help predict `later_anxiety`?

7.5 - What was similar in this process compared to using `condition` as our explanatory variable? What was different? 