# P-Values and Hypothesis Testing 
## Chapter 10.5-10.7 Overview Notebook

In [None]:
# run this to set up the notebook
suppressMessages(library(coursekata))

css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# read in data frame
df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQKXVgu53fsIK6jPh79PbP-nsHQ1J5y-A0fFC_nB_ejeJco_cIePu0hHbYGCCQlnJr9bksE5wFF6Ce1/pub?gid=0&single=true&output=csv")

supernova(lm(time ~ bucket, data = df))

## 1 Recap of the Problem of Inference

We want to know what’s really going on out there in the world: the Data Generating Process (DGP), represented by $\beta_1$. But all we have is a sample (and its $b_1$).

To bridge that gap, we simulated a DGP where the empty model was true ($\beta_1 = 0$) by using `shuffle()` and we generate many randomized $b_1$ (1000 of them but we can always similate more). Together, these made a sampling distribution of $b_1$s.

Even though every $b_1$ came from random shuffling, the sampling distribution showed us that:
- Some $b_1$ are more likely to be produced by this DGP
- Others are unlikely to be produced by the same DGP

By comparing our real $b_1$ to this sampling distribution, we can decide whether it’s plausible that our data came from the empty model:
- If our $b_1$ falls in the unlikely zone, we might reject the empty model.
- If it’s not in the unlikely zone, we might continue to entertain the empty model as plausible.

<div class="guided-notes">
    
### 1.1  Which is the situation where we would *reject* the empty model as the DGP that generated our sample? Which is the *do not reject* situation? 
    
Note which is which in the cells provided and write a brief explanation why.
    
</div>

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:50%">Reject empty-model DGP? Do not reject?<br><br></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:50%">Reject empty-model DGP? Do not reject?<br><br></td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.5-10.7-overview-not.jpg" alt="do not reject"></td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.5-10.7-overview-reject.jpg" alt="reject"></td>
    </tr>
  </tbody>
</table>


<div class="guided-notes">
    
### 1.2 The shaded regions together make up 5% of the sampling distribution. What is this area called? How much is in each tail? 

Label the shaded regions shown in the sampling distributions (above).
    
</div>

<div class="guided-notes">
    
### 1.3 We have introduced a number of labels for the empty model of the DGP. Write down as many ways of saying “empty model of the DGP” as you can.

(We’ll introduce two more: "the null model" or "null hypothesis".)
    
</div>

### 1.4 Shifting from a Visual Strategy to P-Value 

So far, we decided whether to reject or not reject the empty-model DGP based on where our sample statistic fell in the shuffled sampling distribution.

In this notebook, we’re going to calculate the actual probability of getting a sample statistic (e.g., $b_1$) as extreme as the one we got.

This probability is called **p-value**. It gives us a sense of how *unlikely* our sample $b_1$ is under the empty-model DGP.

We can then compare the p-value to $\alpha$ to decide whether to reject or fail to reject the empty-model DGP.

## 2 Experiment: How Long Can You Hold Your Hand in Ice Water?

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/ice-water.png" width = 40% align="right">

Remember the hand-in-ice-water study? We're using the same experiment but this time with 60 participants (instead of 10). 

The data frame is again called `df`:
- `bucket` the condition each student was randomly assigned to, either `ice` or `empty`
- `time` the number of seconds the student kept their hand in the bucket

### 2.1 Exploring Variation and Modeling the Data

Take a moment to look through the word equation, code, and data visualization in your guided notes.

<table border="1" style="font-size: 16px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <tbody>
    <tr>
        <td style="border: 1px solid black; text-align: left; width:50%; vertical-align: top;"><b>Word Equation</b><br>time = bucket + other stuff</td>
      <td style="border: 1px solid black; text-align: left; width:50%" rowspan = 4>R output of bucket_model:<pre><code> (Intercept)    bucketice  
      17.433        8.533</code></pre>
          <img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.5-10.7-overview-df-60.jpg" alt="data viz with model overlaid">
        </td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><b>Explore Variation</b><br><code>gf_jitter(time ~ bucket, data=df, width = .1)</code></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><b>Model Variation</b><br><code>bucket_model <- lm(time ~ bucket, data=df)</code><br><br>
          <b>2.3</b> GLM Notation:</td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; vertical-align: top;"><b>Evaluate Model</b><br>
        <b>2.4</b> </td>
    </tr>
  </tbody>
</table>


<div class="guided-notes"> 

### 2.2 Write the GLM notation for this model in the space provided. 

Also write a version that includes the best-fitting parameter estimates.

</div>

<div class="discussion-question">

### 2.3 Discussion Question: What do you notice in the visualization? Where do you see the value of $b_1$ in the visualization? If you had to guess, what do you think the PRE is for this model?
</div>

In [None]:
# run this code
bucket_model <- lm(time ~ bucket, data=df)

gf_jitter(time ~ bucket, data=df, height = 0, width = .1) %>%
  gf_model(bucket_model)

<div class="guided-notes"> 

### 2.4 Write the R code that will calculate measures such as SSE, F, and PRE to help us evaluate this model 
    
Is the PRE what you thought it would be?
</div>

In [None]:
# code here



## 3 From the ANOVA Table's P-Value to a Shuffle-Based Picture

The ANOVA output reports a p-value! Check it out.

**What is the p-value?** The p-value is the probability of getting a sample statistic (e.g., $b_1$) as extreme as the one we got. 

Just like alpha ($\alpha$) is a two-tailed probability, p-value is as well. For now, know that p-value, like $\alpha$, will have both a positive and negative tail. 


<div class="discussion-question"> 
    
### 3.1 In your own words, what does the p-value (.14) represent? (Note how it relates to our sample $b_1 = 8.5$.)

Try your best! We'll make more sense of p-value as we go. This is just a start.

</div>


<div class="discussion-question">
    
### 3.2 Is our sample $b_1$ (~8.5) unlikely? 
    
Make a Prediction: If we generate a shuffled sampling distribution and mark our sample $b_1$ on it, will it fall in the unlikely tails or in the not-unlikely middle? 
    
</div>

<div class="guided-notes">
    
### 3.3 Build the shuffle-based sampling distribution (`sdob1`) and check where our sample $b_1$ actually falls. Fill in the table with the appropriate R code to accomplish each task. 

</div>


<table border="1" style="font-size: 16px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">Task<br><br></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:60%">R Code<br><br></td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;">Generate one $b_1$ by shuffling</td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"><code>b1(shuffle(time) ~ bucket, data = df)</code></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;">Generate a sampling distribution of 1000 $b_1$s (save it as <code>sdob1</code>)</td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;">Save our sample b1 as an R object called <code>sample_b1</code></td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"><code>sample_b1 <- </code></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left; height: 60px;">Plot the sampling distribution of b1s as a histogram and fill it such that the middle .95 are a different color than the .05 tails<br><br>
Add the <code>sample_b1</code> as a dot</td>
      <td style="border: 1px solid black; text-align: left; height: 60px;"><code>gf_histogram(~_______, data = ________,<br>
  fill = ~________________________) %>%<br>
  gf_point(_______ ~ _______________)</code></td>
    </tr>
  </tbody>
</table>







In [None]:
# Generate a sampling distribution of 1000 b1s
b1(shuffle(time) ~ bucket, data = df)

# Save our sample b1 as an R object called sample_b1


# Plot the sampling distribution of b1s
# Add the sample_b1 as a dot



<div class="discussion-question">
    
### 3.4 Discussion: Was our prediction from just looking at the p-value correct? How should we use p-value to predict whether the sample $b_1$ would be *unlikely* or *not-unlikely* to come from an empty-model DGP?
    
</div>

<div class="guided-notes">
    
### 3.5 Which sampling distribution depicts alpha? P-value? 

Below are two sampling distributions (each with 100 bins rather than the default 30 bins) shaded with: 
- alpha = .05 (2.5% in each tail)
- p-value = .14 (7% in each tail)
    
Label the columns to show which is which, and label the amounts in each tail.

</div>
   

<table border="1" style="font-size: 16px; margin-left: 0; border-collapse: collapse; width: 100%;">
  <thead>
    <tr>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:50%">  <br></td>
      <td style="border: 1px solid black; font-weight: bold; text-align: center; width:50%">  <br></td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black; text-align: left;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.5-10.7-overview-depict-alpha.jpg" alt="visual depiction of alpha"></td>
      <td style="border: 1px solid black; text-align: left;"><img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.5-10.7-overview-depict-pvalue.jpg" alt="visual depiction of p-value"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black; text-align: left;"><code>fill = ~middle(b1, .95)</code></td>
      <td style="border: 1px solid black; text-align: left;"><code> </code></td>
    </tr>
  </tbody>
</table>







<div class="discussion-question">

### 3.6 Discussion Question: Describe the difference between the visual depiction of alpha and p-value. How did you decide which was which?
    
Also relate the visualization of p-value to our definition: the two-tailed probability of getting a sample statistic (e.g., $b_1$) as extreme as the one we got.

</div>

<div class="guided-notes">
    
### 3.7 Write the `middle()` code that would depict the p-value (.14) on a histogram of the sampling distribution
    
</div>

In [None]:
# run this to depict alpha on a histogram of the sampling distribution
# then modify this code to depict p-value
gf_histogram(~b1, data = sdob1, bins = 100, fill = ~middle(b1, .95)) %>%
  gf_point(0 ~ sample_b1)



<div class="discussion-question">

### 3.8 Discussion Question: Why wouldn't <code>middle(b1, .14)</code> work?

</div>

<div class="guided-notes">

### 3.9 Predict and tally up how many shuffled $b_1$s are more extreme than our sample by running the `tally()` function and filling in the table. 

</div>
    

<table border="1" style="font-size: 16px; margin-left: 0; border-collapse: collapse; width: 100%;"> 
    <thead> 
        <tr> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; width:50%">Criteria</td> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; width:20%">Translation</td> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; width:15%">Predict how many $b_1$s out of 1000</td> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; width:15%">How many?</td> 
        </tr> 
    </thead> 
    <tbody> 
        <tr> 
            <td style="border: 1px solid black; text-align: left;"><code>(b1 > sample_b1)</code></td> 
            <td style="border: 1px solid black; text-align: left;">$b_1$ greater than 8.5</td> 
            <td style="border: 1px solid black; text-align: left;"></td> 
            <td style="border: 1px solid black; text-align: left;"></td> 
        </tr> 
        <tr> 
            <td style="border: 1px solid black; text-align: left;"><code>(b1 < -sample_b1)</code></td> 
            <td style="border: 1px solid black; text-align: left;">$b_1$ less than -8.5</td> 
            <td style="border: 1px solid black; text-align: left;"></td> 
            <td style="border: 1px solid black; text-align: left;"></td> 
        </tr>  
        <tr> 
            <td style="border: 1px solid black; text-align: left;"><code>(b1 > sample_b1 | b1 < -sample_b1)</code></td> 
            <td style="border: 1px solid black; text-align: left;">$b_1$ more extreme than 8.5 (greater than 8.5 OR less than -8.5)</td> 
            <td style="border: 1px solid black; text-align: left;"></td> 
            <td style="border: 1px solid black; text-align: left;"></td> 
        </tr> 
    </tbody> 
</table>

In [None]:
# run then modify this code
tally(~ (b1 > sample_b1), data = sdob1)




<div class="discussion-question"> 

### 3.10 Discussion Question: Why aren't the numbers exactly as we predicted? What would happen if we generate another 1000 $b_1$s by shuffling? 

</div>

In [None]:
# run this to see what happens with another 1000 b1s
sdob1 <- do(1000) * b1(shuffle(time) ~ bucket, data = df)
tally(~ (b1 > sample_b1 | b1 < -sample_b1), data = sdob1)

## 4 From Simulated Sampling Distributions to Mathematical Models

The early statisticians who developed the ideas of sampling distributions and p-values didn’t have computers. They couldn’t actually run thousands of shuffles! They could only imagine what a sampling distribution would look like. They built mathematical models of what they imagined and calculated probabilities from them.

That’s what R does when it gives you a p-value in the ANOVA table. It doesn’t use a sampling distribution of shuffled $b_1$s; it uses a mathematical model of it called **the t-distribution**. 

That's why if you run `supernova(bucket_model)` multiple times, it will show you the same p-value.

In [None]:
# run this a few times
supernova(bucket_model)

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/10.5-10.7-overview-t-dist.jpg" alt="t-distribution overlaid on the density histogram of a sampling distribution of b1s" width = 50% align = "right">

### 4.1 The t-distribution is closely related to the normal distribution 

It's a smooth curve that represents what the sampling distribution would look like if the empty model were true.

The t-distribution changes shape slightly based on the degrees of freedom (specifically df Error, the degrees of freedom left after fitting the model). They all look basically like the normal distribution though.

Here we have overlaid the mathematical model on top of the shuffled sampling distribution:


<div class="discussion-question"> 

### 4.2 Discussion Question: What's similar and different between the simulated sampling distribution and the mathematical model of it?

</div>

### 4.3 Run the code below to visualize p-value on a t-distribution using  `xqt()`.

We need to tell `xqt()` the p-value we want it to show and the degrees of freedom. Note that `xqt()` only shows the lower tail so we'll have to divide our p-value by 2.


In [None]:
# run this
xqt(.1403/2, 58)

### 4.4 A special case of modeling: The T-test

When we have a model with just two groups (e.g., the explanatory variable is categorical and it has only two categories), the test of the empty-model DGP is called a **t-test**. R even has a special built-in function for this situation:

`t.test(time ~ bucket, data = df, var.equal=TRUE)` 

But our modeling approach (using `lm()` and `supernova()`) can evaluate the empty-model DGP for any model:
- where the explanatory variable is categorical and has 2 groups (like the t-test)
- where the explanatory variable is categorical and has more than 2 groups
- where the explanatory variable is quantitative 
- where we have more than one explanatory variable

This is why `lm()` and `supernova()` are more powerful and flexible: they generalize the logic of the t-test to all GLM models.


In [None]:
# run this; you'll get the same p-value (0.1403) as we did with supernova
t.test(time ~ bucket, data = df, var.equal=TRUE)


<div class="guided-notes">

### 4.5 Compare the two approaches of special case (t-test with `t.test()`) and general approach (modeling with `lm()` and `supernova()`) by filling in the table below.  
    
Mark Yes (✔️) or No (❌) in the blank spaces provided.
    
</div>

<table border="1" style="font-size: 16px; margin-left: 0; border-collapse: collapse; width: 100%;border: 1px solid black;"> 
    <thead> 
        <tr>
            <td style="border: 1px solid black; font-weight: bold; text-align: center; width:20%"> </td> 
            <td style="border: 1px solid black;font-weight: bold; text-align: center; width:40%">Special Case: t-test</td> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; width:40%">General: Modeling Approach</td> 
        </tr> 
    </thead> 
    <tbody> 
        <tr> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; vertical-align: top;">Specific R Code<br>(for the bucket experiment)</td>   
            <td style="border: 1px solid black; text-align: left; vertical-align: top;"><code>t.test(time ~ bucket, data = df, var.equal = TRUE)</code></td>
            <td style="border: 1px solid black; text-align: left; vertical-align: top;"><code>bucket_model <- lm(time ~ bucket, data = df)</code><br><br><code>supernova(bucket_model)</code></td>
        </tr>
        <tr> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; vertical-align: top;">Generic R Code</td> 
            <td style="border: 1px solid black; text-align: left; vertical-align: top;"><code>t.test(Y ~ X, data = df, var.equal = TRUE)</code></td>
            <td style="border: 1px solid black; text-align: left; vertical-align: top;"><code>model <- lm(Y ~ X, data = df)</code><br><br><code>supernova(model)</code></td>
        </tr>
        <tr> 
            <td style="border: 1px solid black; font-weight: bold; text-align: center; vertical-align: top;">What this method can handle</td> 
            <td style="border: 1px solid black; text-align: left; vertical-align: top;">_____ With a categorical X (only 2 groups)<br>
_____ With a categorical X (with more than 2 groups)<br>
_____ With a quantitative X (e.g., regression model)<br>
_____ With models that have more Xs (e.g., X1 + X2)</td>
            <td style="border: 1px solid black; text-align: left; vertical-align: top;">_____ With a categorical X (only 2 groups)<br>
_____ With a categorical X (with more than 2 groups)<br>
_____ With a quantitative X (e.g., regression model)<br>
_____ With models that have more Xs (e.g., X1 + X2)</td>
        </tr>
    </tbody> 
</table>  