# Student's $t$-Distribution

## Objectives
- Compute probabilities using Student's $t$-distribution.

## Why Student's $t$-Distribution?
In practice, when constructing a confidence interval for a population mean, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation $s$ as an estimate for $\sigma$ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.

William S. Goset (1876–1937) of the Guinness brewery in Dublin, Ireland ran into this problem. His experiments with hops and barley produced very few samples. Just replacing the population standard deviation $\sigma$ with the sample standard deviation $s$ did not produce accurate results when he tried to calculate a confidence interval. He realized that he could not use a normal distribution for the calculation; he found that the actual distribution depends on the sample size. This problem led him to "discover" what is called the Student's $t$-distribution. The name comes from the fact that Gosset wrote under the pen name "Student" when he published his results.

Up until the mid-1970s, some statisticians used the normal distribution approximation for large sample sizes and used Student's $t$-distribution only for sample sizes of at most $30$. With graphing calculators and computers, the practice now is to use Student's $t$-distribution whenever $s$ is used as an estimate for $\sigma$.

## Student's $t$-Distribution
If you draw a simple random sample of size $n$ from a population that has an approximately normal distribution with mean $\mu$ and unknown population standard deviation $\sigma$ and calculate the $t$-score 

$$t = \frac{\bar{x} - \mu}{\left(\frac{s}{\sqrt{n}}\right)},$$

then the $t$-scores follow a Student's $t$-distribution **with $n – 1$ degrees of freedom**. The $t$-score has the same interpretation as the $z$-score. It measures how far $\bar{x}$ is from its mean $\mu$. For each sample size $n$, there is a different Student's $t$-distribution.

The degrees of freedom, $n – 1$, come from the calculation of the sample standard deviation $s$. Recall that if we have a sample of size $n$, we used $n$ deviations (that is, the $n$ values of $(x - \bar{x})$) to calculate $s$. Because the sum of the deviations is zero, we can find the last deviation once we know the other $n – 1$ deviations. This means that the other $n – 1$ deviations can vary freely, but once $n-1$ deviations are known, there is only one number we can choose for the final deviation to get the sample standard deviation $s$. We call the number $n – 1$ the degrees of freedom ($df$).

Here is some more information about Student's $t$-distribution:
- The graph for the Student's $t$-distribution is similar to the standard normal curve.
- The mean for the Student's $t$-distribution is zero and the distribution is symmetric about zero.
- The Student's $t$-distribution has more probability in its tails than the standard normal distribution because the spread of the $t$-distribution is greater than the spread of the standard normal. So the graph of the Student's $t$-distribution will be thicker in the tails and shorter in the center than the graph of the standard normal distribution.
- The exact shape of the Student's $t$-distribution depends on the degrees of freedom. As the degrees of freedom increases, the graph of Student's $t$-distribution becomes more like the graph of the standard normal distribution.

In [27]:
png("t_dists.png", width = 1000, height = 500)

par(mar = c(2, 0, 0, 0))

t <- seq(-3, 3, 0.01)
y <- dnorm(t)
width = 4

plot(t, y, type="l", col="black", xlab = "", ylab = "", axes = FALSE, lwd = width, ylim = c(0, dnorm(0) + 0.06), lty = 1)

y <- dt(t, df = 12)

lines(t, y, col="red", lwd = width, lty = 6)

y <- dt(t, df = 3)

lines(t, y, col="blue", lwd = width, lty = 5)

axis(1, pos=0, at=-3:3, lab=-3:3, lwd.ticks = 0, cex.axis = 2)

lines(c(0, 0), c(0, dnorm(0)), type = "l", lty = 2, cex = 2)

legend(1, 0.4, legend=c("Standard Normal", "Student's t, df=12", "Student's t, df=3"), col=c("black", "red", "blue"), lty=c(1, 6, 5), lwd = width, cex = 2)

dev.off()

```{figure} t_dists.png
---
width: 100%
alt: The standard normal distribution, a Student's t-distribution with df = 12, and a Student's t-distribution with df = 3.
name: t_dists
---
The standard normal distribution, a Student's $t$-distribution with $df = 12$, and a Student's $t$-distribution with $df = 3$.
```

## Student's $t$-Distribution Using R
We can use R to run calculations involving Student's $t$-distribution in almost the same way we did calculations involving the standard normal distribution. Recall that to find the area under the standard normal function to the left of a $z$-value, we use the function <code>pnorm(q)</code>. Similarly, to find the area under Student's $t$-distribution to the left of a $t$-value, we use the function

```R
pt(q, df)
```

where <code>q</code> is the $t$-value, and <code>df</code> is the degrees of freedom.

***


### Example 5.3.1
Consider a $t$-distribution with $11$ degrees of freedom.

1. Find $P(T < -0.5)$.
2. Find $P(T > 0.3)$.
3. Find $P(-1.0 < T < 1.2)$.

#### Solution
##### Part 1

In [28]:
pt(q = -0.5, df = 11)

So $P(T < -0.5) = 0.3135$.

##### Part 2

Just like with the <code>pnorm</code> function, the <code>pt</code> function only gives the area or probability to the *left* of a value. But in this case, we want to find $P(t > 0.3)$, which is the probability to the *right* of $t = 0.3$. Since the total probability is always equal to $1 = 100\%$, we can find this using the formula

$$ P(T > 0.3) = 1 - P(T < 0.3), $$

just like we do with the standard normal distribution. We use R to perform the calculation.

In [30]:
1 - pt(q = 0.3, df = 11)

So $P(T > 0.3) = 0.3849$.

##### Part 3
As we do with the normal distribution, to find $P(-1.0 < t < 1.2)$, we will first find all the area to the left of the larger value $t = 1.2$, then subtract the excess area to the left of the smaller value $t = -1.0$.

In [31]:
pt(q = 1.2, df = 11) - pt(q = -1.0, df = 11)

So $P(-1.0 < T < 1.2) = 0.7029$.

***

Given a value, we can find the corresponding probability using <code>pnorm</code> (for a standard normal distribution) or <code>pt</code> (for a $t$-distribution). We can also do the reverse: given a probability, we can find the value. For the standard normal distribution, we've seen that the function for doing this is <code>qnorm(p)</code>. Similarly, the function for finding a value given a probability for a $t$-distribution is

```R
qt(p, df)
```

where <code>p</code> is the probability to the left of the $t$-value, and <code>df</code> is the degrees of freedom.

***

### Example 5.3.2
Consider a $t$-distribution with $22$ degrees of freedom.

1. Find $t$ so that the area to the left of $t$ is 0.12.
2. Find $t$ so that the area to the right of $t$ is 0.34.
3. Find $t_{0.05}$.

#### Solution
##### Part 1

In [1]:
qt(p = 0.12, df = 22)

So the area to the left of $t = -1.0277$ is $0.12$.

##### Part 2
Just like with the <code>qnorm</code> function, the <code>qt</code> function expects a probability to the *left* of the $t$-value we want. But for this problem, we are asked to find a $t$-value so that the area or probability to the right of $t$ is $0.34$. To find the $t$-value, we first need to find the probability to the left of $t$. Since the total probability is equal to $1 = 100\%$, if the probability to the right of $t$ is $0.34$, that means the probability to the left of $t$ is $1 - 0.34$. We use R to run the calculation.

In [32]:
qt(p = 1 - 0.34, df = 22)

So the area to the right of $t = 0.4180$ is $0.34$.

##### Part 3
Just as we saw with $z$-scores, the notation $t_{0.05}$ is the $t$-value with area $0.05$ to its right.

In [33]:
qt(p = 1 - 0.05, df = 22)

So $t_{0.05} = 1.7171$.