# Contingency tables logistic regression

In this notebook we will look at a simple example of logistic regression and compare the results with what we get from an analysis of a contingency table. The example data comes from the [SOM Survey](https://www.gu.se/en/som-institute/the-som-surveys) performed annually by Göteborgs universitet. We will look at two variables from the 2015 survey and will treat the 1499 respondents as a random sample of the population. The dataset we will work with has been slightly edited.

Start by loading the datasets.

In [1]:
options(repr.plot.width=14, repr.plot.height=8)
suppressMessages(require(dplyr))
suppressMessages(require(ggplot2))
data <- readRDS("data_from_som2015.rds")
names(data)

Cross-tabulate the data

In [18]:
## table w margins
tab <- table(data$sex,data$faith)
xtab <- addmargins(tab)
print("observed")
print(xtab)
print("proportions")
ptab <- prop.table(tab,1)
print(ptab)

[1] "observed"
       
          No  Yes  Sum
  Man    470  228  698
  Woman  413  388  801
  Sum    883  616 1499
[1] "proportions"
       
               No       Yes
  Man   0.6733524 0.3266476
  Woman 0.5156055 0.4843945


In [26]:
## compute the OR based on the contingency table
odds_men <- ptab[1,2]/(1-ptab[1,2])
print(paste("odds for men: ", round(odds_men,2)))
odds_women <- ptab[2,2]/(1-ptab[2,2])
print(paste("odds for women: ", round(odds_women,2)))
odds_ratio <- odds_women/odds_men

[1] "odds for men:  0.49"
[1] "odds for women:  0.94"


In [14]:
## next we fit a logistic regression model
mod <- glm(as.factor(faith)~1+sex,data=data,family=binomial)
summary(mod)


Call:
glm(formula = as.factor(faith) ~ 1 + sex, family = binomial, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1510  -1.1510  -0.8894   1.2040   1.4959  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.72339    0.08071  -8.963  < 2e-16 ***
sexWoman     0.66094    0.10730   6.160 7.27e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2030.2  on 1498  degrees of freedom
Residual deviance: 1991.6  on 1497  degrees of freedom
AIC: 1995.6

Number of Fisher Scoring iterations: 4


In [17]:
## let's look at the coefficients exponentiated
print(round(exp(mod$coefficients),3))

(Intercept)    sexWoman 
      0.485       1.937 


Compare the value of the coefficient for sex with the odds ratio calculated from the contingency table: 


In [31]:
print(round(odds_ratio,3))
sum(mod$coefficients)

[1] 1.937
