# $\chi^2$ Test of Independence

When working with qualitative or category data, we can’t calculate means and standard deviations. Instead, we tally up the counts of each outcome and calculate percentages or proportions.

We have two main proportion tests, one based on the $z$ distribution, one based on the $\chi^2$ distribution. The $z$-proportion test can only be used with one or two samples. We use $\chi^2$ for category variables with more than two levels. Because $\chi^2$ is a more robust statistic, we sometimes use it for small sample experiments in the two sample case.

## Tools for Work

To conduct a $\chi^2$ test of independence by hand, I will need links to the following:

- [Statistics Formula Sheet](https://faculty.ung.edu/rsinn/3350/StatsFormulas.pdf)
- [$\chi^2$ Table](https://faculty.ung.edu/rsinn/3350/Table_ChiSquared.pdf)

We will also the following data set:

In [28]:
pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')

## Example 1: Learning Styles

A psychologist questions whether preference of learning method differs by gender. A group survey is conducted the results of which are shown below. Test whether a relationship exists between gender and learning method.

<table style="width:25%">
  <tr>
    <th></th>
    <th style="text-align: center">Male</th>
    <th style="text-align: center">Female</th> 
  </tr>
  <tr>
    <td style="text-align: center"><b>Visual</b></td>
    <td style="text-align: center">25</td>
    <td style="text-align: center">14</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Auditory</b></td>
    <td style="text-align: center">24</td>
    <td style="text-align: center">42</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Kinethetic</b></td>
    <td style="text-align: center">41</td>
    <td style="text-align: center">34</td>
  </tr>
</table>

In [29]:
data <- matrix(c(25, 24, 41, 14, 42, 34), nrow = 3)
data

0,1
25,14
24,42
41,34


In [30]:
result <- chisq.test(data)
result


	Pearson's Chi-squared test

data:  data
X-squared = 8.665, df = 2, p-value = 0.01313


## Example 2: Snacks vs. Type of Movie

<table style="width:35%">
  <tr>
    <th></th>
    <th style="text-align: center">Snacks</th>
    <th style="text-align: center">No Snacks</th> 
  </tr>
  <tr>
    <td style="text-align: center"><b>Action</b></td>
    <td style="text-align: center">55</td>
    <td style="text-align: center">75</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Comedy</b></td>
    <td style="text-align: center">135</td>
    <td style="text-align: center">155</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Family</b></td>
    <td style="text-align: center">85</td>
    <td style="text-align: center">55</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Horror</b></td>
    <td style="text-align: center">45</td>
    <td style="text-align: center">15</td>
  </tr>
</table>

Conduct a test of indepence upon Type of Movie and the proportion of attendees who purchased snacks to take into the movie.

## Example 3: Accept the Date

Using the **AccDate** varible in the personality data set, test whether Yes/No responses to the dating question depends upon biological sex.

In [31]:
date <- pers[c(3,14)]
head(date,4)

Unnamed: 0_level_0,Sex,AccDate
Unnamed: 0_level_1,<chr>,<chr>
1,M,N
2,F,Y
3,M,Y
4,F,N


In [32]:
xtabs(~AccDate + Sex, data = date)

       Sex
AccDate  F  M
      N 28 28
      Y 46 27