# Background

### setup regression

Distribution
- $Y_i \sim Bin(n_i, \pi_i)$
- $E[P_i] = \pi_i$
- $P_i = \frac{Y_i}{n_i}$

link function $g(\pi_i) = x_i^T \beta = \eta_i$
- logit
    - <font color = 'blue'>$g(\pi_i) = \log \Big( \frac{\pi_i}{1 - \pi_i} \Big)$
- probit
    - <font color = 'blue'>$g(\pi_i) = \Phi^{-1}(\pi_i)$
- complementary log-log
    - <font color = 'blue'>$g(\pi_i) = \log[-\log(1 - \pi_i)]$

# Set environment

In [6]:
library(tidyverse)
library(IRdisplay)

# import data

In [19]:
dat_beetle = data.frame(
    dose = c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839),
    n = c(59, 60, 62, 56, 63, 59, 62, 60),
    y = c( 6, 13, 18, 28, 52, 53, 61, 60)
)
dat_beetle = dat_beetle %>% mutate(n_y = n - y)

# display data
display_html("<font color = 'blue' size = 3>Table 7.2 Beetle mortality data</font>")
display(dat_beetle)

dose,n,y,n_y
1.6907,59,6,53
1.7242,60,13,47
1.7552,62,18,44
1.7842,56,28,28
1.8113,63,52,11
1.8369,59,53,6
1.861,62,61,1
1.8839,60,60,0


In [31]:
dat_beetle_n1 = as.data.frame(rbind(
    cbind(dose = rep(dat_beetle[, "dose"], dat_beetle[, "y"]  ), n = 1, y = 1),
    cbind(dose = rep(dat_beetle[, "dose"], dat_beetle[, "n_y"]), n = 1, y = 0)
))

# display data
display_html("<font color = 'blue' size = 3>$n_i = 1$ situation</font>")
display(dat_beetle_n1 %>% head(10))
display(dat_beetle_n1 %>% tail(10))

dose,n,y
1.6907,1,1
1.6907,1,1
1.6907,1,1
1.6907,1,1
1.6907,1,1
1.6907,1,1
1.7242,1,1
1.7242,1,1
1.7242,1,1
1.7242,1,1


Unnamed: 0,dose,n,y
472,1.8113,1,0
473,1.8113,1,0
474,1.8113,1,0
475,1.8369,1,0
476,1.8369,1,0
477,1.8369,1,0
478,1.8369,1,0
479,1.8369,1,0
480,1.8369,1,0
481,1.861,1,0


# Fit model

### <font color = 'blue'>Grouped data situation</font>

In [23]:
beetle_logit = glm(cbind(y, n - y) ~ dose, data = dat_beetle, family = binomial(link = 'logit'))
summary(beetle_logit)


Call:
glm(formula = cbind(y, n - y) ~ dose, family = binomial(link = "logit"), 
    data = dat_beetle)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5941  -0.3944   0.8329   1.2592   1.5940  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -60.717      5.181  -11.72   <2e-16 ***
dose          34.270      2.912   11.77   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 284.202  on 7  degrees of freedom
Residual deviance:  11.232  on 6  degrees of freedom
AIC: 41.43

Number of Fisher Scoring iterations: 4


### <font color = 'blue'>$n_i = 1$ Situation

In [33]:
beetle_logit2 = glm(cbind(y, n - y) ~ dose, data = dat_beetle_n1, family = binomial(link = 'logit'))
summary(beetle_logit2)


Call:
glm(formula = cbind(y, n - y) ~ dose, family = binomial(link = "logit"), 
    data = dat_beetle_n1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4922  -0.5986   0.2058   0.4512   2.3820  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -60.717      5.181  -11.72   <2e-16 ***
dose          34.270      2.912   11.77   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 645.44  on 480  degrees of freedom
Residual deviance: 372.47  on 479  degrees of freedom
AIC: 376.47

Number of Fisher Scoring iterations: 5


In [43]:
beetle_logit_n1 = glm(y ~ dose, data = dat_beetle_n1, family = binomial(link = 'logit'))
summary(beetle_logit_n1)


Call:
glm(formula = y ~ dose, family = binomial(link = "logit"), data = dat_beetle_n1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4922  -0.5986   0.2058   0.4512   2.3820  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -60.717      5.181  -11.72   <2e-16 ***
dose          34.270      2.912   11.77   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 645.44  on 480  degrees of freedom
Residual deviance: 372.47  on 479  degrees of freedom
AIC: 376.47

Number of Fisher Scoring iterations: 5


**Observation 01** the coefficients are the same

<font color = 'blue' size = 3>Grouped data situation</font>


<font color = 'blue' size = 3>$n_i = 1$ situation</font>



In [44]:
display_html("<font color = 'blue' size = 3>Grouped data situation\n")
print(beetle_logit$coefficients)

display_html("<font color = 'blue' size = 3>$n_i = 1$ situation\n")
print(beetle_logit_n1$coefficients)

(Intercept)        dose 
  -60.71745    34.27033 


(Intercept)        dose 
  -60.71745    34.27033 


Observation 02 deviance

In [45]:
display_html("<font color = 'blue' size = 3>Grouped data situation\n")
print(beetle_logit$deviance)

display_html("<font color = 'blue' size = 3>$n_i = 1$ situation\n")
print(beetle_logit_n1$deviance)

[1] 11.23223


[1] 372.4708
