# S06: Chi-Square



## Categorical Data Analysis with $\chi^2$

The goals of <span style = 'color:blue; font-weight:bold'>linear regression</span> and the <span style = 'color:blue; font-weight:bold'>$\chi^2$ test of independence</span> are very similar:

- A **linear regression** determines if a relationship exists between 2 <span style = 'color:red; font-weight:bold'>numeric</span> variables.
- A **$\chi^2$ test of independence** determines if a relationship exists between 2 <span style = 'color:red; font-weight:bold'>category</span> variables.

## Data Verification

To use $\chi^2$ Test of Independence, we must have no more than 20\% low cell counts in the **expected matrix** where "low cell count" is defined to be **less than five.**

## Example 1: Learning Styles

A psychologist questions whether preference of learning method differs by gender. A group survey is conducted the results of which are shown below. Test whether a relationship exists between gender and learning method.

<table style="width:40%">
  <tr>
    <th></th>
    <th style="text-align: center">Male</th>
    <th style="text-align: center">Female</th> 
  </tr>
  <tr>
    <td style="text-align: center"><b>Visual</b></td>
    <td style="text-align: center">25</td>
    <td style="text-align: center">14</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Auditory</b></td>
    <td style="text-align: center">24</td>
    <td style="text-align: center">42</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Kinethetic</b></td>
    <td style="text-align: center">41</td>
    <td style="text-align: center">34</td>
  </tr>
</table>

In [1]:
data <- matrix(c(25, 24, 41, 14, 42, 34), nrow = 3)
data

0,1
25,14
24,42
41,34


In [3]:
result <- chisq.test(data)
result$expected

0,1
19.5,19.5
33.0,33.0
37.5,37.5


## Example 2: Snacks vs. Type of Movie

Conduct a test of indepence upon Type of Movie and the proportion of attendees who purchased snacks to take into the movie. Test at the $\alpha = 0.05$ level.

<table style="width:35%">
  <tr>
    <th></th>
    <th style="text-align: center">Snacks</th>
    <th style="text-align: center">No Snacks</th> 
  </tr>
  <tr>
    <td style="text-align: center"><b>Action</b></td>
    <td style="text-align: center">55</td>
    <td style="text-align: center">75</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Comedy</b></td>
    <td style="text-align: center">135</td>
    <td style="text-align: center">155</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Family</b></td>
    <td style="text-align: center">85</td>
    <td style="text-align: center">55</td>
  </tr>
  <tr>
    <td style="text-align: center"><b>Horror</b></td>
    <td style="text-align: center">45</td>
    <td style="text-align: center">15</td>
  </tr>
</table>

## Example 3: Accept the Date

Using the **AccDate** varible in the **personality** data set, test at the $0.05$ level of significance whether Yes/No responses to the dating question depends upon biological sex.