**Reference**: <http://daviddalpiaz.github.io/appliedstats/probability-and-statistics-in-r.html#hypothesis-tests-in-r>

# Parametric Test

Making assumptions about the distribution where the sample come froms (like normal assumption). In these assumptions are not met, the result and conclusions from the test might be misleading, incorrect

# T-test

**Reference**:
- [MarinStatsLectures: T-test equal vs unequal variance assumptions](https://www.youtube.com/watch?v=ikS7itcmWZM&list=PLqzoL9-eJTNBq-C2sh46hYIlZYJ0Z1cIB&index=11)

- 2 independent sample T-test with Equal Variance:`t.test(var.equal = T)`
- 2 independent sample T-test with Unequal Variance `t.test()`
- Matched Pair T-test `t.test(paired = T)`

In [1]:
# ?t.test

### One sample T-test

>Suppose a grocery store sells “16 ounce” boxes of Captain Crisp cereal. A random sample of 9 boxes was taken and weighed. The weight in ounces are stored in the data frame capt_crisp.

In [6]:
capt_crisp = data.frame(weights = c(15.5, 16.2, 16.1, 15.8, 15.6, 16.0, 15.8, 15.9, 16.2))
capt_crisp

weights
<dbl>
15.5
16.2
16.1
15.8
15.6
16.0
15.8
15.9
16.2


>The company that makes Captain Crisp cereal claims that the average weight of a box is at least 16 ounces. We will assume the weight of cereal in a box is normally distributed and use a 0.05 level of significance to test the company’s claim.

In [15]:
x_bar <- mean(capt_crisp$weights)
se <- sd(capt_crisp$weights)

mu <- 16
# sample size
n <- nrow(capt_crisp)
# degree of freedom
dg <- n - 1

t_crit <- (x_bar - mu) / (se / sqrt(n))

# left-tail test
p_value <- pt(t_crit, dg)

p_value

p_value is greater than 0.05 significance level, so we reject the null hypothesis

---
Using built-in test

In [17]:
report <- t.test(capt_crisp$weights, alternative = 'less', mu = 16, conf.level = .95)
report


	One Sample t-test

data:  capt_crisp$weights
t = -1.2, df = 8, p-value = 0.1322
alternative hypothesis: true mean is less than 16
95 percent confidence interval:
     -Inf 16.05496
sample estimates:
mean of x 
     15.9 


In [20]:
names(report)

In [19]:
str(report)

List of 10
 $ statistic  : Named num -1.2
  ..- attr(*, "names")= chr "t"
 $ parameter  : Named num 8
  ..- attr(*, "names")= chr "df"
 $ p.value    : num 0.132
 $ conf.int   : num [1:2] -Inf 16.1
  ..- attr(*, "conf.level")= num 0.95
 $ estimate   : Named num 15.9
  ..- attr(*, "names")= chr "mean of x"
 $ null.value : Named num 16
  ..- attr(*, "names")= chr "mean"
 $ stderr     : num 0.0833
 $ alternative: chr "less"
 $ method     : chr "One Sample t-test"
 $ data.name  : chr "capt_crisp$weights"
 - attr(*, "class")= chr "htest"


In [24]:
# p-value
report$p..value
# confidence interval
report$conf.int
# test statistics
report$statistic
# degree of freedom
report$parameter
# an estimate of the true mean
report$estimate

NULL

We can do a 2 sided test

In [25]:
t.test(capt_crisp$weights, alternative = 'two.sided', mu = 16)


	One Sample t-test

data:  capt_crisp$weights
t = -1.2, df = 8, p-value = 0.2645
alternative hypothesis: true mean is not equal to 16
95 percent confidence interval:
 15.70783 16.09217
sample estimates:
mean of x 
     15.9 


### Two samples T-test

In [32]:
sample1 <- rnorm(100, mean = 10, sd = 3)
sample2 <- rnorm(100, mean = 10, sd = 5)
sample3 <- rnorm(200, mean = 8, sd = 5)

In [33]:
t.test(sample1, sample2)


	Welch Two Sample t-test

data:  sample1 and sample2
t = 1.5578, df = 163.58, p-value = 0.1212
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2286509  1.9379163
sample estimates:
mean of x mean of y 
10.185786  9.331154 


In [35]:
# 2 sample T-test with equal variance
t.test(sample2, sample3, conf.level = .99, var.equal = T)


	Two Sample t-test

data:  sample2 and sample3
t = 2.2985, df = 298, p-value = 0.02222
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
 -0.1799432  2.9944041
sample estimates:
mean of x mean of y 
 9.331154  7.923923 


In [36]:
x = c(70, 82, 78, 74, 94, 82)
y = c(64, 72, 60, 76, 72, 80, 84, 68)
t_test_data = data.frame(values = c(x, y),
                         group  = c(rep("A", length(x)), rep("B", length(y))))
t_test_data

values,group
<dbl>,<chr>
70,A
82,A
78,A
74,A
94,A
82,A
64,B
72,B
60,B
76,B


In [37]:
t.test(values ~ group, data = t_test_data)


	Welch Two Sample t-test

data:  values by group
t = 1.8132, df = 10.693, p-value = 0.09794
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.745121 17.745121
sample estimates:
mean in group A mean in group B 
             80              72 


# Test for Proportion

In [2]:
# ?prop.test

# Chi squared test

In [3]:
# ?chisq.test