-
Notifications
You must be signed in to change notification settings - Fork 13
/
11-STAT_220_ChiSquarePreLab-tidy.Rmd
98 lines (64 loc) · 2.63 KB
/
11-STAT_220_ChiSquarePreLab-tidy.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
title: 'Inference for many proportions (Chi-square) - tidy'
author: "Professor McNamara"
output:
html_document:
toc: true
toc_depth: 3
toc_float: true
---
## Setup and packages
We use three packages in this course: `Lock5Data`, `tidyverse` and `infer`. To load a package, you use the `library()` function, wrapped around the name of a package. I've put the code to load one package into the chunk below. Add the other two you need.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error = TRUE)
library(Lock5Data)
# put in the other two packages you need here
```
## Loading in data
As always, we'll load the example data, `GSS_clean.csv`. It should be inside the `data` folder in your RStudio Cloud. We'll use the `read_csv()` function to read in the data.
```{r data-load}
GSS <- read_csv("data/GSS_clean.csv")
```
## Chi-square tests
There are two main tasks we can use a Chi-square test for: goodness of fit tests (for one categorical variable) and tests for association (two categorical variables). We'll go through both of them.
To use a $\chi^2$ distribution as a distributional approximation, we need to check the condition that each of the expected counts is at least 5.
## Chi-square for goodness of fit
For this example, we are going to consider the variable `general_happiness` from the `GSS` data. I would like to study the following hypotheses,
$$
H_0: p_{\text{not too happy}} = p_{\text{pretty happy}} = p_{\text{very happy}} = 0.33 \\
H_A: \text{at least one of the proportions is different}
$$
Let's start by looking at a summary of the data.
```{r}
GSS |>
drop_na(general_happiness) |>
group_by(general_happiness) |>
summarize(n = n()) |>
mutate(prop = n/sum(n))
```
Are the conditions for inference met?
Now, we can use the `chisq_test()` function to do the test!
```{r}
GSS |>
chisq_test(response = general_happiness)
```
What is the conclusion to the test?
## Chi-square for association
For a chi-square test for association, we are looking at the relationship between two categorical variables. For this example, let's consider the relationship between `marital_status` and `general_happiness`.
$$
H_0: \text{there is no association between marital status and general happiness}\\
H_A: \text{there is an association between marital status and general happiness}
$$
Again, we should check to ensure the conditions are met before proceeding.
```{r}
GSS |>
drop_na(marital_status, general_happiness) |>
group_by(marital_status, general_happiness) |>
summarize(n = n())
```
Now, we can use `chisq_test()` to do the test!
```{r}
GSS |>
chisq_test()
```
What is the conclusion of the test?