## 10A: Can parasites control their hosts?

<img src="https://images.newscientist.com/wp-content/uploads/2017/05/04161216/fig.-3_metacercariae_ns.jpg" width=300>

The eye fluke; a parasite that infects fish and lives in their eyes.

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

exp1 <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vT0BJOSII2SFORMbCO9pnXZKCwMzjnBhcHD0QCziE1qbQJcsJz1wjHg3fTLYXfpiA9MICNV5S1IceDf/pub?gid=974536451&single=true&output=csv")
exp2 <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSIrbyNRhOUYD_4nn4xOtQ0tJjNaIn7X4RaqQTG9Zc5Vr3mreonLC4_lbt1FdWgvkW_pRJSvOxDjhNp/pub?gid=636600208&single=true&output=csv")

exp1$infected <- factor(exp1$infected)

### Intro: The Eye Fluke

Although animals typically survive by avoiding predators, this is a problem for parasites who need their hosts to be eaten in order to continue their life cycle.

Take, for instance, the common parasite, the eye fluke (*Diplostomum pseudospathaceum*), which has a life cycle that takes place in three different types of animals.

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/jnb_x0RhFXQq-image.png" alt="life cycle of eye fluke" width=600>

The eye fluke would really like it if the trout would get eaten by a bird (even if the trout wouldn't!).

Dr. Mikhail Gopko and colleagues conducted an experiment to see if parasite-infected trout would change their behavior in a way that would make them more likely to be eaten by birds (e.g., swim closer to the surface of the water).

## 1.0 - The Data

The data frame `exp1` contains data from this published study (https://link.springer.com/article/10.1007/s00265-017-2300-x). 46 lab-raised trout were randomly assigned to be either infected by eye fluke or not. 

- `infected` whether fish is in experimentally infected (1) or control (0) group
- `fish_id` an identification number
- `fish_mass_g` mass of fish in grams
- `infection_intensity` number of eye flukes in fish eye lense
- `depth` mean distance from bottom of aquarium measured in cm (larger numbers indicated shallower swimming--that is, closer to the surface)
- `activity` number of gridlines crossed by fish in a 4 minute period  

In [None]:
# Take a look at the data frame

1.1 - If parasites indeed change their fishy hosts' behaviors, which of the variables above might be interesting outcomes to consider?

1.2 - Write a word equation to represent the researchers' hypothesis. 

## 2.0 - Explore Variation

**2.1 - Discussion:** Here is a visualization to help us explore this hypothesis. What do you think of this hypothesis from the data that you see?

In [None]:
gf_jitter(depth ~ infected, data = exp1, width = .2, height = .2, size = 4) 

**2.2 - Discussion:** Find this fish in the visualization above (we'll call it fish #3).

In [None]:
filter(exp1, fish_id == 3)

In [None]:
# a visualization that will color fish #3's dot in a different color
#gf_jitter(depth ~ infected, data = exp1, width = .2, height = .2, size = 4, color = ~fish_id == 3) 

2.3 - Could we have gotten a distribution of data like this if parasites didn't really affect swimming in the Data Generating Process? How would we represent that as the word equation? 

(Could we try it in R somehow?)

In [None]:
# Modify this code to mimic a no-effect-of-parasites DGP
#gf_jitter(depth ~ infected, data = exp1, width = .2, height = .2, size = 4, color = ~fish_id == 3)

**2.4 - Discussion:** If eye flukes don't change their host's behavior, would fish #3 have acted differently even if it was infected? 

## 3.0 - Model Variation in the Sample

3.1 - What's the best fitting model of the data based on the researchers' hypothesis? 

Specify and fit a formal model (in GLM format). 

$Y_i = ... + e_i$

3.2 - Interpret the best fitting estimates in that model by connecting them to the visualization below.

- $b_0$: 
- $b_1$:

In [None]:
gf_jitter(depth ~ infected, data = exp1, width = .2, height = .2, size = 4) %>%
    gf_model(depth ~ infected, data = exp1, color = "red")

## 4.0 - But what's the best model of the DGP?

4.1a - Is it possible that there is no difference between these two groups in the DGP? If so, how would we represent that hypothesis in GLM notation? 

4.1b - Is it possible that the researchers are right and infected fish really are different from uninfected fish in the DGP? If so, how would we represent that hypothesis in GLM notation? 

4.2 - What is our best guess for the $\beta_1$ in the two equations above? What would happen to the 2nd model if $\beta_1 = 0$? 

4.3 - Could the $\beta_1$ in the DGP be a different number than either of the two numbers above? What else could it be? 

## 5.0 - Simulating a DGP where there is no effect of parasite

We can simulate a DGP with a certain $\beta_1$ (such as $\beta_1 = 0$, no effect of parasite infection) and look at all the $b_1$s that it can produce. Then we can ask: Does our sample seem "unlikely" to have come from this DGP? If so, then we might reject the empty model of the DGP.

5.1 - What kind of world does `shuffle()` mimic? Represent the shuffle DGP using GLM notation: $Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$

5.2a - Run the code below a few times. Why does only one of the numbers change? 

In [None]:
sample_b1 <- b1(depth ~ infected, data = exp1)
sample_b1

b1(shuffle(depth) ~ infected, data = exp1)

5.2b - What does the shuffled `b1` mean?

5.3 - Which of the following are shuffled data? Which one is the real data?

(Also, roughly estimate the values of the shuffled `b1`s. How are they different from the sample `b1`?)

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/jnb_9mndF9qP-image.png" alt="a 3x3 panel of 8 shuffled data and one real data">

5.4 - Here is code for a single shuffled $b_1$. Try creating a sampling distribution of 1000 shuffled $b_1$s. 

In [None]:
b1(shuffle(depth) ~ infected, data = exp1)

5.5 - Make a visualization of the sampling distribution of $b_1$ you created. What's the shape, center, spread of this distribution?

## 6.0 - Bringing in our sample

6.1 - Where does our sample fall relative to this sampling distribution? Is it one of the "unlikely" samples?

In [None]:
sdob1 <- do(1000) * b1(shuffle(depth) ~ infected, data = exp1)

# Add our sample to the distribution
gf_histogram(~b1, data = sdob1, fill = ~middle(b1, .95), bins=100)

Let's recap what all this means using our [distribution triad diagram](https://docs.google.com/presentation/d/1PTEFZGFKLX6mDa3GeceXkkmFTNEOOy2k1K6Tg3InVas/copy)

We started off thinking, "Maybe there is no effect of parasites on swimming behavior. Fish are going to swim at the same depth whether they are infected or not." We call that the empty model (so $\beta_1=0$). This is what we simulated when we did all these shuffles.

But our real sample is an unlikely $b_1$ from the empty model of the DGP.

(Fill in the $\beta_1$ and $b_1$ values on the diagram and move the distributions around accordingly.)

6.2 - So what do you think about the empty model as a possible model of the DGP?

6.3 - Going back to the researchers, what does this mean for their hypothesis? 