# Constructing and Interpreting Sampling Distributions of $b_1$

## Chapter 10.1-10.4 Overview Notebook

In [None]:
# run this to set up the notebook
suppressMessages(library(coursekata))

css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# create the df data frame
id <- 1:10
time <- c(c(10, 12, 15, 48, 70),c(35, 45, 50, 55, 60))
bucket <- c(rep("empty",5),(rep("ice",5)))
df <- data.frame(id, bucket, time)

## 1 Introduction

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-dgp-intro2.jpg" alt="DGP and sample relationship">

In this course, we’ve been fitting models to data (e.g., $Y_i=b_0+b_1X_i+e_i$) and those are the best-fitting models for the sample we have.

But our real goal isn’t to model our sample. We want to model the Data Generating Process (DGP) that produced it! We want to know the model with the true parameters $Y_i = \beta_0 + \beta_1X_i + e_i$ (not just the parameter estimates, $b_0$ and $b_1$).

The challenge is, we can’t observe the DGP directly. We only have data from a sample. The space between these two is called **the problem of inference**, the gap between what we want to know ($\beta_1$) and what we know ($b_1$). 

Our work in this chapter is learning some new concepts that will help us bridge that gap.

## 2 Experiment: How Long Can You Hold Your Hand in Ice Water?

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/ice-water.png" width = 40% align="right">

Substack blogger Adam Mastroianni conducted an experiment at a party that he <a href="https://www.experimental-history.com/p/three-dumb-studies-for-your-consideration?utm_source=substack&publication_id=656797&post_id=165369325&utm_medium=email&utm_content=share&utm_campaign=email-share&triggerShare=true&isFreemail=true&r=157g9&triedRedirect=true">described here.</a> You can do this experiment in class if you'd like, but we will analyze data from a small class, so no need to freeze your hands if you don't want to!

**Design:** In this experiment, students were randomly assigned to one of two conditions. 
- **Ice bucket:** Students were asked to put their right hand in a bucket of ice water and hold it there as long as they wanted.
- **Empty bucket:** Students were asked to put their right hand in a totally empty bucket (no water, no ice, nothing) and hold it there as long as they wanted.

**Contents of the `df` (stands for data frame):** This data frame has 10 students and 3 variables.

- `id` a unique number for each student, 1-10
- `bucket` the condition each student was randomly assigned to, either `ice` or `empty`
- `time` the number of seconds the student kept their hand in the bucket

<div class="guided-notes">

### 2.1 Look at the plot of `time ~ bucket`. Two groups of students are shown, but the labels for the groups have been removed. Which data points do you think belong to which bucket?
    
Which group looks like it has a higher average time?

</div>


<div class="guided-notes">

### 2.2 Write a word equation to represent the hypothesis that `bucket` would affect `time`
    
</div>

<div class="discussion-question">

### 2.3 Discussion: Can this word equation represent the hypothesis that people would keep their hands longer in the empty bucket? How about the hypothesis that people would keep their hands longer in the ice bucket?
    
</div>


<div class="discussion-question">

### 2.4 Discussion: Let's run the code. Was your hypothesis supported or contradicted? What might explain why we see this pattern of variation?
    
(Label the graph appropriately in your guided notes.)
    
</div>


In [None]:
# run this code
gf_jitter(time ~ bucket, data=df, height = 0, width = .1) 

### 2.5 Run this code to fit the model and print out the parameter estimates.

In [None]:
# run this code
bucket_model <- lm(time ~ bucket, data=df)
bucket_model

<div class="guided-notes">
    
### 2.6 Using the R output, draw the model predictions on the jitter plot
    
Also annotate the visualization to show where $b_0$ and $b_1$ are

</div> 

In [None]:
# run this code
gf_jitter(time ~ bucket, data=df, height = 0, width = .1) %>%
  gf_model(bucket_model)

<div class="guided-notes">

### 2.7 Interpret the parameter estimates from the model in the context of this experiment
    
Put a star (*) next to the parameter estimate that is most relevant to evaluating our hypothesis.

</div>

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:30%">Parameter estimate</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:70%">Interpretation</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;">b<sub>0</sub> = 31</td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"> </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;">b<sub>1</sub> = 18</td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"> </td>
    </tr>
  </tbody>
</table>


## 3 Modeling the DGP

It's possible that there is some DGP in the world that causes people to keep their hands in ice water 18 seconds longer than in an empty bucket. But let's consider some other possibilities.

<div class="discussion-question">

### 3.1 Is it possible that the true difference ($\beta_1$) is smaller than 18 seconds? Larger than 18 seconds? Negative? 0?
    
What would it mean for $\beta_1$ to be negative? To be 0?

</div>


<div class="guided-notes">

### 3.2 If the true difference between groups were **0** (that is, $\beta_1 = 0$), what would the model of the DGP be? Plug in 0 into the GLM equation for the DGP to find out. 
    
</div>


### 3.3 Review: Simulating the empty model of the DGP with `shuffle()`

The empty model of the DGP ($\beta_1 = 0$) means that any differences we see between the empty and ice groups are just random chance. If you took these 10 times and split them into two random groups, one group might look like it kept their hands in longer, purely by chance.

And as we saw in prior chapters, we can use `shuffle()` to mimic that kind of DGP! This lets us test whether an empty-model DGP could produce a sample with a $b_1$ as big as 18.

Run the code cell below. 
- In **Graph shuffles**, we’ve used `shuffle()` to create and graph a new variable, `time_shuf`, that represents one possible sample generated from an empty-model DGP. 
- In **Model shuffles**, we calculate the $b_1$ from the shuffled data.

In [None]:
IRdisplay::display_html(sprintf(
  '<iframe src="%s" width="%s" height="%s" frameborder="0"></iframe>',
  "https://coursekata.github.io/teaching-apps/shuffle-demo-10-icy.html?data=10,%2012,%2015,%2048,%2070,%2035,%2045,%2050,%2055,%2060&groupA=empty&groupB=ice&outcome=time&group=bucket&tabs=3",
  width = "100%",
  height = "600px"
))


<div class="discussion-question">
    
### 3.4 Discussion Questions: What happens when we click the run button?
- What is the DGP that generates the `time_shuf` variable?
- What is the DGP that generates the `time` variable?

</div>

<div class="guided-notes">

### 3.5 Each time we run the shuffle, we get a new $b_1$ from our empty-model DGP. How are these empty-model-generated $b_1$s distributed? 
- To answer this, run the code below. For example, when we ran it once, we got 10. That dot has already been added to the plot.
- Each time you run the code, a new $b_1$ will be generated. Add a dot for it on the plot.
- Keep running and plotting until you start to get a sense of what the distribution of $b_1$s from the empty model might look like.
    
*After a few runs, if you find this slow, skip to 3.6 and then continue to plot dots!*
   
</div>

In [None]:
# run this which shuffles time 
# and calculates b1 all in one line of code
b1(shuffle(time) ~ bucket, data = df)

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-draw-b1-dots.jpg" alt="put dots here">

<div class="guided-notes">

### 3.6 Running one shuffle at a time is slow! Fortunately, we can use `do()` to run the same code multiple times and collect the results. Fill in the missing cell in the table below.

</div>

In [None]:
# modify this 
b1(shuffle(time) ~ bucket, data = df)


<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:30%">Task</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:70%">R Code</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; vertical-align: middle;">Do one shuffle at a time</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: middle;"><code>b1(shuffle(time) ~ bucket, data = df)</code></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; vertical-align: middle;">Do 5 shuffles at a time</td>
      <td style="border: 1px solid black; text-align: left; vertical-align: middle;"></td>
    </tr>
  </tbody>
</table>


<div class="discussion-question">

### 3.7 Discussion Questions:

- Where do most of the dots seem to cluster?
- What might the shape, center, and spread of the shuffled $b_1$s look like?
- Do these shuffled $b_1$s make sense as “children” of an empty-model DGP where $\beta_1 = 0$? Why or why not?

</div>

### An example of the figure with some shuffled $b_1$s drawn in as dots

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-draw-b1-dots3.jpg" alt="with a few dots drawn in representing shuffled b1s">

<div class="guided-notes">

### 3.8 Where does our sample $b_1$ from the study fall? Draw an arrow and label where the actual $b_1$ falls relative to these shuffled $b_1$s.

Is it plausible that our observed $b_1$ could have been created by a DGP where $\beta_1 = 0$? What makes you say that?

</div>

### An example of the figure with some shuffled $b_1$s drawn in as dots

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-draw-b1-dots3.jpg" alt="with a few dots drawn in representing shuffled b1s">

## 4 Using `shuffle` to Create a Sampling Distribution

Up to now, we’ve been looking at a few $b_1$s generated by a DGP where the ($\beta_1$) is 0 (no effect of bucket).

In this section, we’ll use many shuffles (like 1,000!) to build a **sampling distribution** of $b_1$s under that empty-model DGP.

We can use that bigger sampling distribution as a *probability distribution* to ask: What is the likelihood of generating $b_1$ as extreme as 18 if $\beta_1 = 0$ were really true?

<div style="font-size: 18px; line-height: 1.4; border: 2px solid black; padding: 10px;">

**A sampling distribution** is a distribution of parameter estimates (like $b_1$), not raw data. It is generated by repeatedly fitting a model to many different samples from the same data generating process (DGP). 
    
So far we have been creating a sampling distribution of $b_1$s from the empty-model DGP.
    
A sampling distribution is distinct from both the distribution of the *sample data* and the *population distribution*.
    
</div>

### 4.1 Modify the code below to run the empty-model DGP 1000 times to make a sampling distribution of many of the possible $b_1$s it could generate.

Then save these into a data frame called `sdob1` (**s**ampling **d**istribution **o**f **b1**). Write some code to examine the contents of `sdob1`.

In [None]:
# modify this 
do(5) * b1(shuffle(time) ~ bucket, data = df)



### 4.2 Visualize the sampling distribution of $b_1$ saved in the `sdob1` data frame

Run the code below to see a dot plot of the 1000 $b_1$s (way faster than drawing them in).

Then modify the code to create a histogram, which is the more common way to visualize a distribution.

In [None]:
# modify this
gf_dotplot(~ b1, binwidth = 1, data = sdob1)



<div class="discussion-question">
    
### 4.3 Discussion Questions: What is represented in this distribution? 
- How is this histogram similar to and different from the dots you drew earlier?
- What does a very tall bar represent? A very short bar?
- Which are less likely values of $b_1$ to be generated from this empty-model DGP?
- Why does this distribution appear to be centered at 0?

</div>

<div class="guided-notes">

### 4.4 Where does our sample $b_1$ fall on this histogram? Add code to represent it as a dot.
    
What does its position tell you about how our sample relates to the empty-model DGP?
    
</div>

In [None]:
# this saves the sample_b1
sample_b1 <- b1(time ~ bucket, data=df)

# modify this to make a sampling distribution
# then add the sample_b1 as a dot
#gf_histogram(~__, data=__) %>%
#  gf_point(__ ~ __, color="red")


## 5 Using the sampling distribution of $b_1$ as a probability distribution

Back in Chapter 6, we explored the idea of using a distribution of data as a probability distribution. We can use a sampling distribution in much the same way. 

Some shuffled $b_1$ are more likely than others; the ones closer to 0 are more common while very large or very small $b_1$s are rare. 

<div class="guided-notes">

### 5.1 Shade in the histograms to show the following proportions. Then estimate the proportion for each case.

</div>

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; text-align: left; width:50%">$b_1 > 0$ (positive $b_1$s)<br><br>
        Estimated proportion: </td>
      <td style="border: 1px solid black; text-align: left; width:50%">$b_1 < 0$ (negative $b_1$s)<br><br>
        Estimated proportion: </td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-histogram-to-shade-in2.jpg" alt="histogram to shade in"></td>
      <td style="border: 1px solid black; text-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-histogram-to-shade-in2.jpg" alt="histogram to shade in"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; width:50%">$b_1 > 20$<br><br>
        Estimated proportion: </td>
      <td style="border: 1px solid black; text-align: left; width:50%">$b_1 > 5$<br><br>
        Estimated proportion: </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-histogram-to-shade-in2.jpg" alt="histogram to shade in"></td>
      <td style="border: 1px solid black; text-align: center;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-histogram-to-shade-in2.jpg" alt="histogram to shade in"></td>
    </tr>
  </tbody>
</table>


### 5.2 Those proportions can also be interpreted as probabilities. 

For example: 
- 0.50 represents the probability that a DGP where $\beta_1 = 0$ could produce a sample with a positive $b_1$. There is a 50% chance that the empty model will produce a positive $b_1$. 
- The probability of generating a $b_1 > 20$ is much smaller than that (around 0.05-0.10). It's less likely that the empty model can produce a $b_1$ bigger than 20 (versus bigger than 0).


<div class="discussion-question">

### 5.3 Discussion Question: What do you think is the probability this DGP generating a $b_1$ greater than 40? If the sample $b_1$ is very unlikely (like 40, 50, or even -2000), would you believe that the empty-model DGP ($\beta_1 = 0$) produced it? Or would you reject it as a possible model of the DGP that produced this sample?
 
</div>

<div class="discussion-question">

### 5.4 Discussion Question: What do you think is the probability this DGP generating a $b_1$ greater than 1? If the sample $b_1$ is not so unlikely (like -2 or 5), would you believe that the empty-model DGP ($\beta_1 = 0$) produced it? Or would you reject it as a possible model of the DGP?
 
</div>

<div class="guided-notes">

### 5.5 Use the table below to summarize how probability relates to decisions about the empty-model DGP.

</div>

<table border="1" style="font-size: 18px; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: left; width:50%">If the sample $b_1$ is <i>very unlikely</i> to be generated by the empty-model DGP:</td>
      <td style="border: 1px solid black; font-weight: bold; text-align: left; width:50%">If the sample $b_1$ is <i>not so unlikely</i> to be generated by the empty-model DGP:</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; padding: 8px;">What would the probability be like?<br><br>
      </td>
      <td style="border: 1px solid black; text-align: left;">What would the probability be like?<br><br></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; padding: 8px;">Would you believe that the empty-model DGP could have produced it or reject it?<br><br>
      </td>
      <td style="border: 1px solid black; text-align: left;">Would you believe that the empty-model DGP could have produced it or reject it?<br><br></td>
    </tr>
  </tbody>
</table>


<div class="guided-notes">

### 5.6 Where are the unlikely $b_1$s in the sampling distribution? Draw a border around those parts of the sampling distribution (ideally in a light color, e.g., a yellow highlighter).
</div>

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.1-10.4-overview-histogram-to-shade-in2.jpg" alt="sampling distribution to label" width = 70%>

### 5.7 If a sample $b_1$ falls in the extreme upper or lower tail of the sampling distribution, we might reject the empty-model (null-model) DGP as the true model.

In statistics, this is called a **two-tailed test** because we could reject the empty model if the sample $b_1$ was extreme in either direction:
- Upper tail: very large $b_1$
- Lower tail: very small $b_1$ 

<div class="guided-notes">

### 5.8 Label the tails of the sampling distribution.

- Which side is the upper tail? Which is the lower tail?
- Which tail represents people keeping their hands longer in the ice bucket?
- Which tail represents people keeping their hands longer in the empty bucket?

</div>

### 5.9 But what counts as “unlikely”? Introducing the community standard: alpha ($\alpha$)

Different people might choose different cutoff points for what feels unlikely. So the research community uses a shared standard of what counts as "unlikely" called **the alpha level ($\alpha$).

> Alpha ($\alpha$) is the cutoff probability for deciding that a sample result is unlikely enough to reject the empty model. In many fields, the $\alpha = 0.05$. 

If you take the 1000 b1s and line them up in order, the lowest 0.025 and the highest 0.025 values would be the most extreme 5% of values and therefore the most unlikely values to be randomly generated.

We can use the `middle()` function to color the middle 0.95 of b1s in our histogram differently than the most extreme 0.05 b1s.


In [None]:
# run this
gf_histogram(~b1, data = sdob1, fill = ~ middle(b1, .95))

<div class="guided-notes">

### 5.10 Shade in the alpha level (the most extreme 5% of $b_1$s) on your sampling distribution

Also note how to use the `middle()` function with a histogram

<br><center><code>gf_histogram(~b1, data = sdob1, _______________________________________)</code></center>

</div>

<div class="guided-notes">

### 5.11 The moment of truth: Does the sample $b_1$ fall in the “unlikely” $\alpha$ zone?

- Add your sample $b_1$ to the sampling distribution.
- Does it fall inside the shaded $\alpha$ region, or within the middle 95%?
- Based on what you see, would you reject the empty model of the DGP?

</div>

In [None]:
# run this
gf_histogram(~b1, data = sdob1, fill = ~ middle(b1, .95)) %>%
  gf_point(0 ~ sample_b1, color = "red", size = 4)

<div class="discussion-question">
    
### 5.12 Discussion Question: What if we had set the alpha level for this study as 0.30? How would that change our decision to reject or not the empty model? 

</div>

In [None]:
# modify this to shade the sampling distribution for an alpha = .30
gf_histogram(~b1, data = sdob1, fill = ~ middle(b1, .95)) %>%
  gf_point(0 ~ sample_b1, color = "red", size = 4)
