# Contingency tables and $\chi^2$

In this notebook we will look how to compute the $\chi^2$ test-statistic for a 2x2 contingency table. The example data comes from the [SOM Survey](https://www.gu.se/en/som-institute/the-som-surveys) performed annually by GÃ¶teborgs universitet. We will look at two variables from the 2015 survey and will treat the 1499 respondents as a random sample of the population. The dataset we will work with has been slightly edited.

Start by loading the datasets.

In [None]:
options(repr.plot.width=14, repr.plot.height=8)
suppressMessages(require(dplyr))
suppressMessages(require(ggplot2))
data <- readRDS("data_from_som2015.rds")
names(data)

Cross-tabulate and compute the expected cell-values based on independence of margin probabilities 

In [None]:
## table w margins
tab <- table(data$sex,data$faith)
xtab <- addmargins(tab)
## expected values under independence 
expected <- function(xtab){
    expected <- matrix(nrow=2,ncol=2)
    for (i in 1:2)
        for (j in 1:2)
            expected[i,j] <- xtab[i,3]/xtab[3,3]*xtab[3,j]
    return(expected)
}
print("observed")
print(xtab)
print("expected")
print(addmargins(expected(xtab)))

In [None]:
## next we compute the chi-square 
mychi <- function(xtab){
    expect <- expected(xtab)
    chi2 <- 0
    for (i in 1:2)
        for (j in 1:2)
            chi2 <- chi2+(expect[i,j]- xtab[i,j])**2/expect[i,j]
    return(chi2)
}
print(mychi(xtab))


To know if this value is large or small, we will next simulate a number of contingency tables having the same marginals. For each of these tables we compute the $\chi^2$. This is the distribution under the null-hypothesis.

In [None]:
## use the function r2dtable to generate random samples from 2x2 contingenct tables
rtab <- addmargins(r2dtable(1,margin.table(tab,1),margin.table(tab,2))[[1]])
mychi(rtab)
print(rtab)

In [None]:
## here we generate many such random matrices and compute chi2
Nsim <- 10000
stat<- vector(length=Nsim)
for (i in 1:Nsim){
    rtab <- addmargins(r2dtable(1,margin.table(tab,1),margin.table(tab,2))[[1]])
    stat[i]<-mychi(rtab)
}

In [None]:
hist(stat,30)

In [None]:
sum(stat > mychi(xtab))/Nsim