# $\chi^2$ Test of Independence

Just as ANOVA is the straightforward extension of $t$ procedures into the cases where we have more than 2 samples of numeric data, $\chi^2$ methods are the mathematical extension of $z$-proportion procedures for categorical data.

## Example

The table below shows a breakdown at a certain university of the number of students still undecided about their majors compared to the number who chosen a major already.

<table style="width:60%">
<tr>
  <th></th>
  <th>Freshman</th>
  <th>Sophomore</th>
  <th>Junior</th>
</tr>
<tr>
  <th>Have Chosen a Major</th>
  <td style="text-align: center;">114</td>
  <td style="text-align: center;">168</td>
  <td style="text-align: center;">198</td>
</tr>
<tr>
    <th>Have <b>not</b> Chosen a Major</th> 
  <td style="text-align: center;">212</td>
  <td style="text-align: center;">171</td>
  <td style="text-align: center;">92</td>
  </tr>
</table>

We create the observed data below:

In [16]:
obs <- matrix(c(114,212,168,171,198,92),ncol=3)
obs

0,1,2
114,168,198
212,171,92


We add column titles and row titles as follows:

In [17]:
colnames(obs) <- c('Freshmen', 'Sophomore', 'Junior')
rownames(obs) <- c('Have Chosen', 'Have NOT Chosen')
obs

Unnamed: 0,Freshmen,Sophomore,Junior
Have Chosen,114,168,198
Have NOT Chosen,212,171,92


### Conduct the Test

In [18]:
chisq.test(obs)


	Pearson's Chi-squared test

data:  obs
X-squared = 68.207, df = 2, p-value = 1.545e-15


### Reporting Out

Because $p = 1.545\times 10^{-15} < 0.05 \alpha$, we reject the null. We thus have evidence that the percentage of students who have chosen their majors already depends upon which year in school they are.