# MATH 3350 Course Notes - Module S3

## Introduction to Hypothesis Testing: The Lady Tasting Tea

Recall the scenario where Muriel claimed tea tastes better when the milk is added to the cup before the tea. Muriel was convinced adding the tea first, then the milk, was inferior. She claimed she could taste the difference.
We will generate a simulation of the [blind taste test](https://en.wikipedia.org/wiki/Lady_tasting_tea) that Fisher performed, but simplified.  

### The Scenario as a Simulation
We want to simulate eight cups of tea, where each cup is prepared either by pouring milk first or tea first. Then simulate allowing Muriel to GUESS for each cup whether it was prepared with milk or tea first. Therefore, her probability of guessing correctly for any one cup is 0.5. 

#### Model 1:  A Coin Toss
Since the probability of a correct guess (assuming Muriel is guessing) is 0.5, so we can simulate each guess with a coin toss. 

- Let 'HEADS' represent a correct guess
- Let 'TAILS' represent a wrong guess

Define one **TRIAL** as 8 coin tosses. (_Why?_)

Toss your coin 8 times and count the number of heads.  Write this number down.  What does it represent?

Repeat the trial (8 tosses) 5 more times and record your result each time.  Now we have several simulated trials of Muriel guessing (8 guesses per trial).  We can use the data we have collected to make a histogram.

In [None]:
#Complete with number of results for each category
count0 <- rep(0,0 )
count1 <- rep(1,0 )
count2 <- rep(2,0 )
count3 <- rep(3,0 )
count4 <- rep(4,0 )
count5 <- rep(5,0 )
count6 <- rep(6,0 )
count7 <- rep(7,0 )
count8 <- rep(8,0 )

#Put all results in one vector
all_results <- c(count0,count1,count2,count3,count4,count5,count6,count7,count8)
hist(all_results,breaks=c(-1,0,1,2,3,4,5,6,7,8,9))


#### Model 2:  Random Outcome Generator

We can simulate coin tosses (as in Module S0).  Similarly, instead of Heads/Tails, we can simulate YES/NO outcomes for whether the guess is correct.  _**Remember that this is a model to simulate Muriel GUESSING,**_ and we are assuming she has a 50-50 chance of guessing correctly.

- 'Y' (for YES) means the guess is correct
- 'N' (for NO) means the guess is not correct

In [None]:
guess <- c('Y','N')                        #Define a vector with all possible outcomes (yes = correct guess)
results <- sample(guess,8,replace = TRUE)  #Simulates 8 guesses with Yes/No equally likely
print(results)                             #Look at our result vector

num_correct <- sum(results == 'Y')            #Count number of correct guesses

cat('There were',num_correct,'correct out of 8 guesses.')  #Show results in a full sentence  

#### Why might we want to repeat the above trial thousands of times?
We'll repeat this trial 10000 times and store the result (number of correct guesses) for each trial. **Why would we want to do this?**

#### Looking at the results
We'll create a histogram to visualize the number of correct guesses (out of 8) over all the trials.  
Remember: EACH trial consists of 8 guesses (1 for each cup of tea).  We are interested in the _number of correct guesses_ in each trial.

In [None]:
num_correct <- c()           # create a vector to store the number of correct guesses for each trial
num_trials = 10000     # set the number of trials

for (i in 1:num_trials){
    results <- sample(guess,8,replace = TRUE) # create a trial of 8 'cups' and guess for each
    num_correct[i] <- sum(results == "Y")     # count and store the number of correct guesses in this trial
}

head(num_correct, 20)        # display results of first 20 trials



In [None]:
hist(num_correct)        # histogram of all results

#### How can we use the data we just generated?
Using these results, what is the **empirical probability** that all 8 guesses would be correct?

In [None]:
all8right <- num_correct >= 8
head(all8right,30)


In [None]:
idx_8right <- which(all8right)   #Show vector index of entries with value TRUE
idx_8right


    

In [None]:
all8right[idx_8right]            #Show entries in all8right with above index values

num_correct[idx_8right]          #Show entries in num_correct in the same index values

In [None]:
num_correct[all8right]         #Same result with entire logical vector

In [None]:
count_all8right <- sum(all8right)  #How many trials had all 8 right?
count_all8right

length(num_correct[all8right])     #Second way to find same information

#### What is the empirical probability of getting all 8 correct, IF she is only guessing?
Also, what does this probability suggest?

In [None]:
count_all8right/num_trials

#### Model 3:  Random Outcomes from Known Distribution

You may recognize the first 2 models as equivalent to random outcomes in a binomial distribution. Recall that for known distributions, there is another way to generate random outcomes.  We can use this to create a similar set of test data.

In [None]:
successes <- rbinom(num_trials, size=8, prob=0.5)    #10000 trials, 8 teacups, 0.5 chance of guessing correctly
hist(successes)

In [None]:
# Empirical probability from binomial distribution random values simulation
emp_prob <- sum(successes>=8)/num_trials
cat("Simulation 2: ", emp_prob, "\n")

### Theoretical Approach

The three methods above all use simulations and/or randomly generated values to find an **empirical probability** based on data we generated. Given what we know about how some distributions behave, we can also directly find the **theoretical probability**.

#### Questions

1. What is the _theoretical probability_ of getting exactly 8 correct guesses in 8 tries?
2. How does the theoretical proabability compare to the empirical probability in the simulations above?



In [None]:
# Theoretical probability of exactly 8 correct in 8 guesses
theo <- dbinom(8,size=8,prob=0.5)
cat("Theoretical probability: ", theo, "\n")



### What does it all mean??
* Remember that this is a _**conditional probability**_-- we assume that she was guessing.
* How does the low probability inform your decision about whether our assumption (that she was guessing) is correct?
* These are the beginnings of a **hypothesis test**.

## Hypothesis Tests
These are the steps to conducting a hypothesis test:  
1. Identify a population parameter and state null and alternative hypotheses about the parameter.<br>These should be stated in such a way that the _**null**_ hypothesis is the "default" assumption unless there is sufficiently strong evidence in favor of the _**alternative**_ hypothesis.
    
    
2. Create a model consistent with the NULL HYPOTHESIS.<br>This could be a _theoretical_ model (e.g., an exact probability distribution) or an _empirical_ model (e.g., a simulation).
    
    
3. Use the model to determine the probability that results as extreme as those we observed would occur purely by random chance IF the null hypothesis were true. (This probability is called a p-value.)


4. Based on the p-value, decide whether to reject the null hypothesis in favor of the alternative.


5. Draw a conclusion in the context of the scenario given.  


**_To complete a hypothesis test, you must carry out steps 4 and 5 after the p-value is found._**  

Below, we will examine how we carried out each of these steps.

### Step 1: Identifying a Parameter and Hypotheses

- Our parameter is the proportion of times (in the long run, or "on average") that Muriel will correctly identify the order that milk and tea were poured.
- Our NULL hypothesis is our default assumption unless there is strong evidence to the contrary<br>**NULL Hypothesis:** Muriel is guessing, so in the long run, the proportion identified correctly is one half, or $p = \frac{1}{2}$
    
- Our ALTERNATIVE hypothesis is the other possibility we are exploring<br>**ALTERNATIVE Hypothesis:** Muriel can actually detect a difference, so the proportion identified correctly is better than one half, or $p > \frac{1}{2}$

### Step 2: Creating the Model of the Null Hypothesis

We explored two methods of creating a model of the null hypothesis:

1. Use simulation/randomization to create an empirical model and find a p-value.
2. Use a theoretical distribution to find a p-value. (There is sometimes more than one suitable theoretical distribution.)

### Step 3: Use the Model to Find a p-Value

We found a p-value for each model.  The p-value is interpreted as the following _conditional probability_:<br>
_**If we assume that the null hypothesis is true**, the p-value is the probability that we would see results as extreme as the ones we have observed, just by random chance alone._

### Step 4: Decide Whether Evidence is Sufficient to Reject Null Hypothesis

Based on our p-values, is there strong evidence for the alternative hypothesis instead of the null hypothesis?  If so, we reject the null hypothesis.  Today's scenario may seem decisive, but we will discuss in future lessons how to decide whether our evidence is strong enough.

### Step 5: Conclusion

The decsion made in step 4 implies that we have drawn a conclusion about the original question we were investigating. 

- What was our original question?
- What is our conclusion?