# The $\chi^2$-Distribution

## Objectives
- Calculate probabilities using the $\chi$^2-distribution.

## The $\chi^2$-Distribution

In this chapter, we learn about how to conduct hypothesis tests using the $\chi^2$-distribution. The $\chi^2$-distribution ($\chi$ is the Greek letter "chi", pronounced "kai") is different than the normal distribution or the $t$-distribution. Like the $t$-distribution, the $\chi^2$-distribution depends on the degrees of freedom $df$, and there are different $\chi^2$-distributions for different values of the degrees of freedom. The mean of the $\chi^2$-distribution is $\mu = df$, where $df$ is the degrees of freedom. The standard deviation is $\sigma = \sqrt{2(df)}$.

The random variable for a $\chi^2$-distribution with $k$ degrees of freedom is the sum of $k$ independent squared standard normal variables:

$$ \chi^2 = Z_1^2 + Z_2^2 + \cdots + Z_k^2. $$

Unlike the normal distribution and the $t$-distribution, the curve of the $\chi^2$-distribution is asymmetrical.

In [1]:
png("chisq_diff_df.png", width = 1000, height = 750)

#par(mar = c(2, 0, 0, 0))
par(mfrow = c(3, 1), mar = c(2, 0, 0, 0), mgp = c(3, 2, 0))

x = seq(0, 60, 0.01)
y1 = dchisq(x, df = 2)
y2 = dchisq(x, df = 8)
y3 = dchisq(x, df = 24)

polyx = c(0, x, 60)
polyy1 = c(0, y1, 0)
polyy2 = c(0, y2, 0)
polyy3 = c(0, y3, 0)

M = dchisq(0, 2)

plot(x, y1, type="l", xlab = "", ylab = "", ylim = c(0, M), axes = FALSE, cex = 2, cex.lab = 2)
polygon(polyx, polyy1, col = "gray90", border = "NA")
text(3.9, dchisq(2, 2) + 0.01, labels = "df = 2", cex = 3, pos = 3)
axis(1, pos=0, at=0:60, lwd.ticks = 0, cex.axis = 3)
lines(c(2, 2), c(0, dchisq(2, 2)), type = "l", lty = 2)
lines(x, y1, type="l")
lines(c(0, 0), c(0, M))

plot(x, y2, type="l", xlab = "", ylab = "", ylim = c(0, M), axes = FALSE, cex = 2, cex.lab = 2)
polygon(polyx, polyy2, col = "gray90", border = "NA")
text(8, dchisq(8, 8) + 0.01, labels = "df = 8", cex = 3, pos = 3)
axis(1, pos=0, at=0:60, lwd.ticks = 0, cex.axis = 3)
lines(c(8, 8), c(0, dchisq(8, 8)), type = "l", lty = 2)
lines(x, y2, type="l")

plot(x, y3, type="l", xlab = "", ylab = "", ylim = c(0, M), axes = FALSE, cex = 2, cex.lab = 2)
polygon(polyx, polyy3, col = "gray90", border = "NA")
text(24, dchisq(24, 24), labels = "df = 24", cex = 3, pos = 3)
axis(1, pos=0, at=0:60, lwd.ticks = 0, cex.axis = 3)
lines(c(24, 24), c(0, dchisq(24, 24)), type = "l", lty = 2)
lines(x, y3, type="l")

dev.off()

```{figure} chisq_diff_df.png
---
width: 100%
alt: The plots of the chi-squared distributions with df = 2, with df = 8, and with df = 24.
name: chisq_diff_df
---
Three different $\chi^2$-distributions with different degrees of freedom. The mean $\mu$ of a $\chi^2$-distribution is equal to its degrees of freedom $df$, so $\mu = df$.
```

To find probability left of some value when using the $\chi^2$-distribution, we will use the R function

```R
pchisq(q, df)
```

where <code>q</code> is the $\chi^2$-value, and <code>df</code> is the degrees of freedom.

***


### Example 7.1.1
Consider a $\chi^2$-distribution with $4$ degrees of freedom.

1. Find $P(\chi^2 < 2)$.
2. Find $P(\chi^2 > 6)$.
3. Find $P(3 < \chi^2 < 5)$.

#### Solution
##### Part 1
We can calculate $P(\chi^2 < 2)$ simply using R:

In [25]:
pchisq(q = 2, df = 4)

So $P( \chi^2 < 2) = 0.2642$.

##### Part 2
We want to find $P(\chi^2 > 6)$, which means we want to find the area to the *right* of $\chi^2 = 6$. The <code>pchisq</code> function only returns the probability to the *left* of the given value, sowe will use the formula $P(\chi^2 > 6) = 1 - P(\chi^2 < 6)$ to calculate what we need. In R, this formula becomes:

In [27]:
1 - pchisq(q = 6, df = 4)

So $P(\chi^2 > 6) = 0.1991$.

##### Part 3
To find $P(3 < \chi^2 < 5)$, we will first calculate *all* the area to the left of the larger value $\chi^2 = 5$, then subtract the excess area to the left of the smaller value $\chi^2 = 3$.

In [28]:
pchisq(q = 5, df = 4) - pchisq(q = 3, df = 4)

So $P(3 < \chi^2 < 5) = 0.2705$.
