# $T^{2}$ Hotelling Test

Tests hypothesis:

> $H_{0}: \mathbf{\mu} = \mathbf{\mu_{0}}$

Vs

> $H_{1}: \mathbf{\mu} \neq \mathbf{\mu_{0}}$

(Where $\mathbf{\mu}$ here is a $p \times 1$ vector).

**A lot of garbage to sift through in this material**

If $\mathbf{X_{1}},\mathbf{X_{2}},...,\mathbf{X_{n}}$ are a random sample of random vectors from $N_{p}(\mu, \Sigma)$, where $\mu$ is a $p \times 1$ vector and $\Sigma$ is  a  $p \times p$ covariance matrix for $\mathbf{X} \sim N_{p}(\mu, \Sigma)$, then:

1. Our test statistic is the multivariate analogue of the t-statistic, squared, (i.e. it's really a kind of F statistic).
2. This statistic is the $T^{2}$ Hotelling statistic
3. $T^{2} = (n-1)(\bar{\mathbf{x}} - \mu_{0})^{T}S^{-1}(\bar{\mathbf{x}} - \mu_{0})$
4. Where $S$ is the biased MLE of $\Sigma$
5. Under $H_{0}$, $T^{2}$ is drawn from $T^{2}_{p}(n-1)$ distribution, where $n-1$ is the number of degrees of freedom when you have this random sample set up
6. When $n>p$, i.e. the sample size is greater than the number of parameters, then $T^{2}_{p}(n-1) = \dfrac{(n-1)p}{n-p}F_{p, n-p}$
7. **You can therefore run the hypothesis test, and reject $H_{0}$ at the $100\alpha\%$ level of significance if:**

> $T^{2} > \dfrac{(n-1)p}{n-p}F_{p, n-p, \alpha}$

**BY REARRANGING TO PRODUCE AN F-STATISTIC**

## Example

Dimensions of 2 year old boys  from high altitude regions of Asia:

1. Height
2. Chest Circumference
3. MUAC: Middle Upper Arm Circumference

(all measured in cm)

For low altitude boys of the same age, we have means of 87.8, 58.4, and 15.9cm, respectively.

Want to test the hypothesis that the high altitude group have the same means.

Want to test the hypothesis that:

> $H_{0}: \mu = \mu_{0} = [87.8, 58.4, 15.9]^{T}$

---

### Let's load the high altitude data

In [2]:
b.height <- c(78, 76, 92, 81, 81, 84)
b.chest <- c(60.6, 58.1, 63.2, 59, 60.8, 59.5)
b.muac <- c(16.5, 12.5, 14.5, 14, 15.5, 14)
boys <- data.frame(b.height, b.chest, b.muac)

In [9]:
n <- 6
p <- 3

### We have 3 parameters and 6 measurements

> $p = 3$

> $n = 6$

In [4]:
boys

b.height,b.chest,b.muac
78,60.6,16.5
76,58.1,12.5
92,63.2,14.5
81,59.0,14.0
81,60.8,15.5
84,59.5,14.0


### And now let's calculate our $T^{2}$ test statistic

> $T^{2} = (n-1)(\bar{\mathbf{x}} - \mu_{0})^{T}S^{-1}(\bar{\mathbf{x}} - \mu_{0})$

**NOTE R WILL CALCULATE THE UNBIASED ESTIMATOR**

**THEREFORE DON'T CHANGE A THING WHEN IT COMES TO YOUR F-STAT, BUT DO CHANGE HOW YOU CALCULATE $T^{2}$ TO REFLECT $S_{U}$**:

> $T^{2} = n(\bar{\mathbf{x}} - \mu_{0})^{T}S_{U}^{-1}(\bar{\mathbf{x}} - \mu_{0})$

In [5]:
mu.0 <- c(87.8, 58.4, 15.9)

In [6]:
x.bar <- apply(boys, 2, mean)

x.bar

In [7]:
S <- var(boys)

# get inverse of a matrix if you use general solve with no other args
S.inv <- solve(S)

In [11]:
T.2 <- n * t(x.bar - mu.0) %*% S.inv %*% (x.bar - mu.0)
T.2

0
271.6115


## Run hypothesis test:

> $T^{2} > \dfrac{(n-1)p}{n-p}F_{p, n-p, \alpha}$

> $F > T^{2} \times \dfrac{n-p}{(n-1)p} > F_{p, n-p, \alpha}$

In [13]:
F <- T.2 * (n-p)/( (n-1) * p )

In [14]:
F

0
54.32229


In [15]:
pf(F, p, n-p, lower.tail = FALSE)

0
0.004103264


**We can therefore reject the null hypothesis at the 1% level of significance.**

* If you did three separate univariate t-tests, you'd potentially not be able to reject any of the hypotheses at even the 5% level.

Multivariate results seem to contract the single variate results.

Look at the sample correlations and mean differences.



In [16]:
cor(boys)

Unnamed: 0,b.height,b.chest,b.muac
b.height,1.0,0.8030562,0.06452822
b.chest,0.80305616,1.0,0.53361485
b.muac,0.06452822,0.5336148,1.0


**Can get the combination of variables that deviates the most from the expected value looking for the eigenvector associated with the highest variance:**

> $\mathbf{a}^{*} = \mathbf{S}^{-1} (\bar{\mathbf{x}} - \mu_{0})$

In [18]:
S.inv %*% (x.bar - mu.0)

0,1
b.height,-2.759249
b.chest,10.577627
b.muac,-7.303719
