# Salamander Growth in Mack Creek (Part II)
## Chapter 8.5-8.7 Modeling the DGP and Revisiting Randomness

In [None]:
# run this cell to set up the notebook
suppressMessages(library(coursekata))
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# this will pull in the giant_sal data
giant_sal <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8kCNPCY830PwnzJLnriG70NRApkgELQEJ_n2DY0-b8AgBhIAl2sH00RILA3Indfji8pX-baJu7_Z5/pub?gid=2069091587&single=true&output=csv")
 

## 1. The `giant_sal` data set: Salamander Growth in Mack Creek

<img src="https://lter.github.io/lterdatasampler/reference/figures/and_salamander.jpg" alt="Coastal Giant Salamander at Mack Creek study site" width="250" style="float: left; margin-right: 20px; height: auto;"> 

Biologists studying coastal giant salamanders in Mack Creek (Willamette National Forest, Oregon) wanted to know how the harvesting of forests affects salamander growth. Forest harvesting involves cutting down trees, which could change the creek environment in which the salamanders live and thus impact how large they grow to be. 

To investigate this, they captured and measured the length of 60 coastal giant salamanders. Some salamanders were captured in areas where trees had recently been cut down (clear-cut), while others came from untouched old-growth forest. Your task is to explore this dataset, called `giant_sal`, to find out whether forest harvesting impacts salamander size.

### Variables in the `giant_sal` Data Frame:

- `year`: The year the salamander was caught and measured  
- `location`: The section of forest in which the salamander was caught (clear cut or old growth)
- `habitat`: Where in the creek the salamander lived: Coded as `pools` (deeper, slower-moving parts of the creek) or `cascades` (shallow, rocky areas with faster-moving water)
- `length_mm`: Body length in millimeters, measured from its snout to the start of its tail (excluding the tail)  

Note: This is a subset of data from [the Long Term Ecological Research program (LTER) Network](https://lter.github.io/lterdatasampler/reference/index.html).

### 1.1 Review of the `giant_sal` data frame

In a previous notebook, we explored the following hypothesis:

**length_mm = location + other stuff**

Run the code below to review the data and the model.

$length\_mm = 62.30 + -14.77X_i + e_i$

In [None]:
# run this code

# look at the data frame
head(giant_sal)

# fit and save the model
location_model <- lm(length_mm ~ location, data = giant_sal)
location_model

# plot with the location model
gf_jitter(length_mm ~ location, data = giant_sal, width = .1) %>%
  gf_model(location_model, color="red")

### 2. How big is the effect of `location`?

Calculate the following measures of effect size and interpret each of them.


#### 2.1 Measure and interpret effect size: $b_1$

In [None]:
# write code here


2.1 Response:

#### 2.2 Measure and interpret effect size: Cohen's d

In [None]:
# write code here


2.2 Response:

#### 2.3 Measure and interpret effect size: PRE

In [None]:
# write code here


2.3 Response:

#### 2.4 Measure and interpret effect size: F

In [None]:
# write code here


2.4 Response:

### 3. Could it be a DGP where `location` doesn't matter (where $\beta_1$ = 0)?

We have fit the model to our data to calculate $b_1$, which is our best estimate of $\beta_1$ (the effect of $X$ on $Y$).

But we really want to know what the true $\beta_1$ might be in the DGP. We may not be able to directly know what $\beta_1$ is, but we can estimate how likely it is that our sample $b_1$ came from a DGP where $\beta_1$ = 0 (i.e., a DGP where there is actually no effect of `location` on `length_mm`).

#### 3.1 Simulate and plot a DGP where $\beta_1$ = 0 (aka the empty model)

In [None]:
# write code here

#simulate 1000 shuffled b1s


#plot the shuffled b1s in a histogram


#### 3.2 Interpret this distribution.

3.2 Response:

#### 3.3 Use gf_point() to plot our sample $b_1$ onto the distribution.

Chain the sample code below onto your plot and replace 'replace_me' with the value of our sample $b_1$.

```
# chain onto your plot with %>% and fix 'replace_me'
gf_point(0 ~ replace_me, color = "red")
```

#### 3.4 Does our sample $b_1$ seem like a common mean difference we might expect if the empty model were true?

3.4 Response:

 #### 3.5 Which model of the DGP should we favor?

The empty model (where $\beta_1$ = 0): $length\_mm = \beta_0 + e_i$

The location model (where $\beta_1$ does not equal 0): $length\_mm = 62.30 + -14.77X_i + e_i$

3.5 Response: