# MATH 3350 Course Notes - Module S2

## Introduction to Hypothesis Testing: The Lady Tasting Tea, Revisited

Recall the scenario where Muriel claimed tea tastes better when the milk is added to the cup before the tea. Muriel was convinced adding the tea first, then the milk, was inferior. She claimed she could taste the difference.
We will generate a simulation of the [blind taste test](https://en.wikipedia.org/wiki/Lady_tasting_tea) that Fisher performed, but simplified.  

#### The scenario
We want to simulate eight cups of tea, where each cup is prepared either by pouring milk first or tea first. Then simulate allowing Muriel to GUESS for each cup whether it was prepared with milk or tea first. Therefore, her probability of guessing correctly for any one cup is 0.5. 

In [None]:
guess <- c('Y','N')                        #Define a vector with all possible outcomes (yes = correct guess)
results <- sample(guess,8,replace = TRUE)  #Simulates 8 guesses with Yes/No equally likely
print(results)                             #Look at our result vector

num_correct <- sum(results == 'Y')            #Count number of correct guesses

cat('There were',num_correct,'correct out of 8 guesses.')  #Show results in a full sentence  

#### Why might we want to repeat the above trial thousands of times?
We'll repeat this trial 10000 times and store the result (number of correct guesses) for each trial. **Why would we want to do this?**

#### Looking at the results
We'll create a histogram to visualize the number of correct guesses (out of 8) over all the trials.  
Remember: EACH trial consists of 8 guesses (1 for each cup of tea).  We are interested in the _number of correct guesses_ in each trial.

In [None]:
num_correct <- c()           # create a vector to store the number of correct guesses for each trial
num_trials = 10000     # set the number of trials

for (i in 1:num_trials){
    results <- sample(guess,8,replace = TRUE) # create a trial of 8 'cups' and guess for each
    num_correct[i] <- sum(results == "Y")     # count and store the number of correct guesses in this trial
}

head(num_correct, 20)        # display results of first 20 trials



In [None]:
hist(num_correct)        # histogram of all results

#### How can we use the data we just generated?
Using these results, what is the empirical probability that all 8 guesses would be correct?

In [None]:
all8right <- num_correct >= 8
head(all8right,30)


In [None]:
idx_8right <- which(all8right)   #Show vector index of entries with value TRUE
idx_8right


    

In [None]:
all8right[idx_8right]            #Show entries in all8right with above index values

num_correct[idx_8right]          #Show entries in num_correct in the same index values

In [None]:
num_correct[all8right]         #Same result with entire logical vector

In [None]:
count_all8right <- sum(all8right)  #How many trials had all 8 right?
count_all8right

length(num_correct[all8right])     #Second way to find same information

#### What is the empirical probability of getting all 8 correct, IF she is only guessing?
Also, what does this probability suggest?

In [None]:
count_all8right/num_trials

### Another Method

Recall that for known distributions, there is another way to generate a similar set of test data.

In [None]:
successes <- rbinom(num_trials, size=8, prob=0.5)    #10000 trials, 8 teacups, 0.5 chance of guessing correctly
hist(successes)

In [None]:
# Empirical probability from binomial distribution random values simulation
emp_prob <- sum(successes>=8)/num_trials
cat("Simulation 2: ", emp_prob, "\n")

### Questions

1. What is the theoretical probability of getting exactly 8 correct guesses in 8 tries?
2. How does the theoretical proabability compare to the empirical probability in the simulations above?



In [None]:
#1. Theoretical probability of exactly 8 correct in 8 guesses
theo <- dbinom(8,size=8,prob=0.5)
cat("Theoretical: ", theo, "\n")



## What does it all mean??
* Remember that this is a conditional probability-- we assume that she was guessing.
* How does the low probability inform your decision about whether our assumption (that she was guessing) is correct?
* These are the beginnings of a **hypothesis test**.