# Handout 5. One and Two-Sample Tests
## 1.	One sample $t$-test
The $t$ tests are based on an assumption that data come from the normal distribution. In the one-sample case we assume that data $x_1, \dots, x_n$ are normal random variables with mean  $\mu$ and variance $\sigma^2$. We wish to test the null hypothesis that $\mu=\mu_0$.

Consider an example concerning daily energy intake in kJ for 11 women (Altman, 1991, p. 183). First, the values are placed in a data vector:

In [1]:
daily.intake <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515, 8230,8770)
mean(daily.intake)

In [2]:
sd(daily.intake)

In [3]:
quantile(daily.intake)

In [4]:
res<-t.test(daily.intake,mu=7725)
names(res)

In [5]:
res$para

In [6]:
res$conf.int

In [7]:
res$statistic

In [8]:
res$p.value

In [9]:
res$method

## 2.	Wilcoxon signed-rank test
The $t$ tests are fairly robust against departures from the normal distribution especially in larger samples, but sometimes you wish to avoid making that assumption. To this end, the distribution-free methods are convenient.

For the one-sample Wilcoxon test, the procedure is to subtract the theoretical $\mu_0$ and rank the differences according to their numerical value, ignoring the sign, and then calculate the sum of the positive or negative ranks. The point is that, assuming only that the distribution is symmetric around $\mu_0$, the test statistic corresponds to selecting each number from 1 to $n$ with probability 1/2 and calculating the sum. The distribution of the test statistic can be calculated exactly, at least in principle. It becomes computationally excessive in large samples, but the distribution is then very well approximated by a normal distribution.

In [10]:
res2 = wilcox.test(daily.intake, mu=7725)
names(res2)

“cannot compute exact p-value with ties”

## 3.	Two sample $t$-test
The two-sample $t$ test is used to test the hypothesis that two samples may be assumed to come from distributions with the same mean. 

The theory for the two-sample $t$ test is not very different in principle from that of the one-sample test. Data are now from two groups, $x_{11}, x_{12}, \dots, x_{1m}$ and $x_{21}, x_{22}, \dots, x_{2m}$ which we assume are sampled from the normal distributions $N(\mu_1, \sigma_1^2)$ and $N(\mu_2, \sigma_2^2)$, and it is desired to test the null hypothesis $\mu_1=\mu_2$. You then calculate
$$t=\frac{\bar{x}_1-\bar{x}_2}{\sqrt{\mbox{SEM}_1^2+\mbox{SEM}_2^2}},$$

where $\mbox{SEM}$ is the standard error of the mean.

In [11]:
ret<-read.table("http://www.ams.sunysb.edu/~songwu/AMS561/d_logret_6stocks.txt", header=T)
attach(ret)
res3<-t.test(Pfizer, Intel)
names(res3)

In [12]:
res3$stat

In [13]:
t.test(Pfizer, Intel, var.equal=T)


	Two Sample t-test

data:  Pfizer and Intel
t = 0.21707, df = 126, p-value = 0.8285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.01578133  0.01966986
sample estimates:
   mean of x    mean of y 
-0.004041315 -0.005985579 


** Exercise:**
 - Perform for ‘Citigroup’ one sample test with the null hypothesis that the mean is zero. 

 - Perform the Wilcoxon signed-rank test for ‘Citigroup’.

 - Perform the two-sample test for ‘Pfizer’ and ‘Citigroup’. 

## 4.	Comparison of variances

In [14]:
var.test(Pfizer, Intel)


	F test to compare two variances

data:  Pfizer and Intel
F = 0.11577, num df = 63, denom df = 63, p-value = 3.703e-15
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.07033263 0.19055829
sample estimates:
ratio of variances 
          0.115769 


## 5.	Two-sample Wilcoxon test

In [15]:
wilcox.test(Pfizer, Intel)


	Wilcoxon rank sum test with continuity correction

data:  Pfizer and Intel
W = 2019, p-value = 0.892
alternative hypothesis: true location shift is not equal to 0
