## One-Sample t-tests
One sample t-tests are used to compare the mean of some data with a known value.  Because it is unlikely that you will be comparing your sample data with a known population mean that has a known population standard error, we will focus on t- and not z-tests.

### Load Packages

In [None]:
library(tidyverse)
library(DescTools)
library(plotrix)
library(lsr)
library(pwr)
library(readxl)

options(repr.plot.width=5, repr.plot.height=5) ## set options for plot size within the notebook -
# this is only for jupyter notebooks, you can disregard this.

We'll use the mtcars dataset for this example

In [None]:
head(mtcars)  ## remind ourselves of the setup of the mtcars dataset.

## One-Sample t-test
We want to know if the average mpg of the cars in our sample `mtcars` is lower than the current national average of 24.7 mpg.

How would we run a hypothesis test for this?

### Step 1 - Formulate Hypothesis

Population mean is $\mu$, which specified in our question - 24.7mpg

Sample mean is $\bar{x}$ which we will calculate from our data.

$H_0 : \bar{x} = 24.7$

$H_A : \bar{x} < 24.7$

**Note:** Given our $H_A$ we're running a one-tailed test.

### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> assumed for now

Sample is <10% of the population? ->>> Yes

Let's take a look at our distribution and summary statistics first to get an idea of our sample data.

In [None]:
## IS OUR SAMPLE DISTRIBUTION NORMAL?
qqnorm(mtcars$mpg) ## calculate QQ values
qqline(mtcars$mpg, col="red")  ## create QQ plot


In [None]:
summary(mtcars$mpg) ## summary statistics - mean, median, min, max, etc.
sd(mtcars$mpg) ## standard deviation
std.error(mtcars$mpg) ## standard error

## show that the standard error is sd/sqrt(n)
sd(mtcars$mpg)/sqrt(length(mtcars$mpg))

## what is our n?
length(mtcars$mpg)

Our sample data deviates from normality a bit in the tails, but it's actually not that bad in terms of approximating a normal distribution.

So we can already see that our sample mean is 20.09, which is lower than 24.7, but is the difference statistically significant?

### Step 3 - Calculate t-statistic and p-value

In [None]:
## using t.test to calculate one-sample t-statistic - t.test(VECTOR, mu, alternative = "one.sided" or" less" or "greater",
                                                                                                # conf= 1- alpha)

t.test(mtcars$mpg, mu = 24.7, alternative = "less", conf = 0.95)

**So what we see in the output is:**

our t-statistic/t-value is -4.3263

we have 31 degrees of freedom (n-1)

It tells us in words what our H_A is

and it gives us a 95% CI.  This is the 95% CI related to the "less than" hypothesis, therefore it starts at -Inf, because we're looking at that side of the distribution.  

If we wanted a 95% CI as a range, we'd run a two-sided test.

In [None]:
## using t.test to calculate one-sample t-statistic - t.test(VECTOR, mu, alternative = "one.sided" or" less" or "greater",
                                                                                                # conf= 1- alpha)

t.test(mtcars$mpg, mu = 24.7, alternative = "two.sided", conf = 0.95)

This gives a 95% CI centered around our sample mean, but it also is testing a different alternative hypothesis, whether the mean is different from 24.7 in any direction.

### Effect Size

We now want to know how big the difference is between our sample mean and the population mean.  For this we will calculate Cohen's d.

In [None]:
cohensD(mtcars$mpg, 24.7) ## cohen's d function, two arguments, our vector of data, and our population mean

Our "rule of thumb" for interpreting Cohen's d is:

| Cohen's d | Effect Size |
|:---------:|:-----------:|
| 0.20 | Small |
| 0.50 | Medium|
| 0.80 | Large |

So our value of 0.765 is consistent with a relatively large effect size (e.g. difference between sample mean and population mean).  Would you consider a difference in about 4.7 mpg a difference that would be important in the real world?

### Power

Finally, we want to see how much power our test has.  To conduct a power analysis we need 5 of the 6 pieces of information:

- n: sample size
- d: effect size (in this case cohensD)
- sig.level: alpha
- power: power (1-$\beta$)
- alternative: type of alternative hypothesis (two.sided, less, or greater)
- type: right now we're only looking at one.sample tests

When we run the power function we set one of these things to NULL, to calculate that value based on the other inputs.  Typically we will either be wanting to calculate a sample size *a priori* or to calculate the power of an analysis after you conduct the analysis.

We want to get the power for our mpg analysis, so we know our arguments will be:

- n = 32
- d = 0.765
- sig.level = 0.05
- power = NULL
- alternative = "two-sided" - Let's look at the more "conservative" version of the test.
- type = "one.sample"


In [None]:
x <- pwr.t.test(d = 0.765, n = 32, sig.level=0.05, power = NULL, alternative = "two.sided", type = "one.sample")
x

Our power is extremely high.  If we only wanted to have a power of 0.80 we wouldn't have needed as large of a sample, given our effect size:

In [None]:
pwr.t.test(d = 0.765, n = NULL, sig.level=0.05, power = 0.80, alternative = "two.sided", type = "one.sample")

## One Sample t-test - Proportions

At the most basic level a proportion is a mean of a vector of 0/1 observations.  Given this, we can also do t-tests to calculate the difference in proportions.

For this we'll use the small_gss dataset.

In [None]:
small_gss <- read_xls("small_gss.xls")
head(small_gss)

We'll look at the proportion of female respondents who report that they support abortion for any reason `abany`.  We'll compare our sample mean to the value of 61% support as reported by Pew - https://www.pewforum.org/fact-sheet/public-opinion-on-abortion/

Before we can conduct our analysis I'm going to do some quick data cleaning.

In [None]:
small_gss  %>% count(abany) #pre-recoding counts
df <- small_gss %>% ## save recoded version as df
    filter (abany != "Not applicable")  %>% #filter out rows where abany was missing (because the question wasn't asked)
    filter (sex == "Female")  %>%    ## we've defined our population as women only, so we'll subset to just those observations
    mutate (abany = ifelse(abany == "Yes", 1, 0)) ## using ifelse to convert chr yes/no to numerical 1/0 
                                                ## the format of the function is ifelse(test, valiftrue, valiffalse)
df %>%count(abany)

### Step 1 - Formulate Hypothesis

Population proportion is $p_u$, which specified in our question - 0.61 (61%)

Sample proportion is $p_s$ which we will calculate from our data.

$H_0 : p_s = 0.61$

$H_A : p_s \neq 0.61$

Given our $H_A$ we're running a two-tailed test.

### Step 2 - Prepare and Check Conditions

Set alpha ->>> $\alpha = 0.05$

Random and independent sample ->>> assumed for now

Sample is <10% of the population? ->>> Yes

Let's take a look at our distribution and summary statistics first to get an idea of our sample data.

In [None]:
mean(df$abany) ## mean of 0/1 variable is proportion saying "yes"
p <- mean(df$abany) ## save prop to use below

std.error(df$abany) ## standard error
## show that the standard error is square root of p_s(1-p_s)/n
sqrt(p*(1-p)/length(df$abany))

## what is our n?
length(df$abany)

So we can already see that our sample proportion is 0.47, which seems much different from 0.61, but is the difference statistically significant?

(SPOILER ALERT: it has to be, like really...)

### Step 3 - Calculate t-statistic and p-value

The correct way to compare a one-sample proportion with a given probability to test against, is by using `prop.test()`.

#### prop.test()
`prop.test()` takes 5 arguments:

- x = the total number of "yes" or "success" or ones
- n = the total number of observations
- p = the probability to test against (default is 0.5)
- alternative = the type of test to conduct ("two.sided", "less", or "greater") depending on your alternative hypothesis
- correct = optional to use Yates' continuity correction - default is TRUE

In [None]:
# prop.test(x, n, p = NULL, alternative = "two.sided", correct = TRUE)
x = sum(df$abany) ## summing a 0/1 variable gives you the count of "yesses"
n = length(df$abany) ## length of a vector gives you the number of values in that vector
p = 0.61 ## the probability we want to test against
prop.test(x, n, p = p, alternative = "two.sided")

Our output shows a chi-squared value, not a t-value!  That's because under the hood R is calculating this using the chi-square distribution because we're looking at frequencies (counts).  Our extremely low p-value indicates that we can reject the null hypothesis (no surprise).

#### t.test() with 0/1 data

Given that our proportion is a mean of a 0/1 variable, we can simply use our t-test function and obtain similar results.

In [None]:
prop_t <- t.test(df$abany, mu = 0.61, alternative = "two.sided", conf = 0.95) ### remember we can save our results
prop_t ## and then print them

We see almost identical results, but this time we have a t-value and our alternative hypothesis references a mean of 0.61.

In that block I saved the results to an object called `prop_t`.  Remember when we did the chi-square test that that object held all of the information outputted from the test.  Let's look at the structure of that object.

In [None]:
str(prop_t) ## look at the structure of the t.test output

We get all of the "pieces" of the result as parts of the object - it's a list where we have key:value pairs.  We can use those keys to obtain the single pieces of information we want.

In [None]:
prop_t$statistic # t value/statistic
prop_t$parameter # degrees of freedom
prop_t$conf.int # confidence interval

### Effect Size - Comparing Proportions

To compare our proportions, we should look at Cohen's H, which is a non-directional magnitude of difference in proportions.

In [None]:
### The function is ES.h(ps,pu) - remember ps is sample prop. and pu is population (universe) prop.

ES.h(mean(df$abany),0.61)


In [None]:
cohensD(df$abany, 0.61)  ## try the Cohen's D, which is the mean effect size.

We calculated Cohen's H, for proportions, and also Cohen's D to compare (given that the mean of 0/1 data is a proportion).  We got very similar results in terms of absolute magnitude.  The negative sign for the Cohen's H result is because it's a uni-directional test.  However I wouldn't rely on using Cohen's D for proportion data unless you find that they are virtually identical.

### Power

We can also calculate the power of a comparison of proportions.  All of the arguments are similar to those above, except notice we're using a different formula (pwr.<b>p</b>.test, not pwr.<b>t</b>.test) and we have <b>h</b> as the effect size, not <b>d</b>.

In [None]:
h1 <- ES.h(mean(df$abany), 0.61)
n1 <- length(df$abany)


pwr.p.test(h = h1, n= n1, sig.level=0.05, power=NULL, alternative="two.sided")

Our difference is so big that our power is virtually 1.  Which means we have pretty much no probability of Type II error.  Let's compare to using the `pwr.t.test()` function.

In [None]:
d1 <- cohensD(df$abany, 0.61)

pwr.t.test(d = d1, 
           n = n1, ## same n1 we set in the block above
           sig.level=0.05, 
           power = NULL, 
           alternative = "two.sided", 
           type = "one.sample")

We get essentially the same result.

### BONUS CONTENT: Power Plots

Power plots show us the range of sample sizes for a range of levels of power.  "Optimal" sample size line is the number of observations you would need to achieve the power specified in the function call.

In [None]:
power_plot_data <- pwr.t.test(d=0.30, n=NULL, sig.level=0.05, power = 0.8, alternative = "two.sided")
plot(power_plot_data)