# [A GLM Example](https://www.stat.umn.edu/geyer/5931/mle/seed2.pdf)

data: https://www.stat.umn.edu/geyer/5931/mle/seeds.txt

In [9]:
mydata = read.table("./data/seeds.txt")
mydata = within(dat, {
    vegtype = factor(vegtype)
    burn01  = factor(burn01)
    burn02  = factor(burn02)
    burn03  = factor(burn03)
})
summary(mydata)

         vegtype   burn01    burn02    burn03      totalseeds   
 lab         :15   lab: 15   lab: 15   lab: 15   Min.   : 96.0  
 oldfieldcool:72   no :102   no :114   no :126   1st Qu.:100.0  
 oldfieldwarm:18   yes: 60   yes: 48   yes: 36   Median :100.0  
 plantcool   :36                                 Mean   :100.2  
 plantwarm   :36                                 3rd Qu.:100.0  
                                                 Max.   :111.0  
   seedlings     
 Min.   : 0.000  
 1st Qu.: 0.000  
 Median : 1.000  
 Mean   : 3.475  
 3rd Qu.: 4.000  
 Max.   :29.000  

In [10]:
head(mydata)

Unnamed: 0_level_0,vegtype,burn01,burn02,burn03,totalseeds,seedlings
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<fct>,<int>,<int>
1,oldfieldcool,yes,no,no,100,5
2,oldfieldcool,yes,no,no,100,3
3,oldfieldcool,yes,no,no,100,4
4,oldfieldcool,no,no,no,100,0
5,oldfieldcool,no,no,no,100,3
6,oldfieldcool,no,no,no,100,1


- Description
    - `seedling` the number of seedlings that sprouted
    - `vegtype`: type of field
        - `oldfield`
        - warm
        - cool
        - `lab`: plants grown in the lab
    - burn
        - `burn01` a field was burned in 2001
        - `burn02` a field was burned in 2002
        - `burn03` a field was burned in 2003
            - Owing to a mistake in the experimental design `burn03` is completely confounded with vegtype and hence **has been omitted from the analysis** (this isn’t right, but it is not clear what else to do).
- regression thinking: 
    - response variable and assume to be $\text{Poisson}(n\lambda)$
    - predictor: `vegtype` and three `burn` variables
- Poisson regression
    - log link
    - $\eta = \log(n) + \log(\lambda)$
    - $\log(n)$ is just a known constant
        - $n = \text{totalseeds}$
        - in R, set as offset: `offset(log(totalseeds)).`
        
       

## Fitting Poisson Regression Models

In [11]:
out = glm(
    seedlings ~ vegtype + burn01 + burn02 + offset(log(totalseeds)),
    data = mydata,
    family = poisson
)

summary(out)


Call:
glm(formula = seedlings ~ vegtype + burn01 + burn02 + offset(log(totalseeds)), 
    family = poisson, data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7099  -1.7682  -0.7277   0.7292   4.7376  

Coefficients: (2 not defined because of singularities)
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)         -1.63528    0.05793 -28.229  < 2e-16 ***
vegtypeoldfieldcool -1.11844    0.19266  -5.805 6.43e-09 ***
vegtypeoldfieldwarm -0.98491    0.22438  -4.390 1.14e-05 ***
vegtypeplantcool    -2.53154    0.27442  -9.225  < 2e-16 ***
vegtypeplantwarm    -1.72796    0.22884  -7.551 4.32e-14 ***
burn01no            -0.68432    0.15289  -4.476 7.61e-06 ***
burn01yes                 NA         NA      NA       NA    
burn02no            -0.72038    0.15623  -4.611 4.01e-06 ***
burn02yes                 NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for po

In [13]:
#An alternative way to deal with the “offset” is to use the offset optional
#argument to the glm function, rather than using the offset function in the
#formula

out.too = glm(
    seedlings ~ vegtype + burn01 + burn02, 
    offset = log(totalseeds),
    data = mydata, family = poisson)

all.equal(coefficients(out), coefficients(out.too))


Call:
glm(formula = seedlings ~ vegtype + burn01 + burn02, family = poisson, 
    data = mydata, offset = log(totalseeds))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7099  -1.7682  -0.7277   0.7292   4.7376  

Coefficients: (2 not defined because of singularities)
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)         -1.63528    0.05793 -28.229  < 2e-16 ***
vegtypeoldfieldcool -1.11844    0.19266  -5.805 6.43e-09 ***
vegtypeoldfieldwarm -0.98491    0.22438  -4.390 1.14e-05 ***
vegtypeplantcool    -2.53154    0.27442  -9.225  < 2e-16 ***
vegtypeplantwarm    -1.72796    0.22884  -7.551 4.32e-14 ***
burn01no            -0.68432    0.15289  -4.476 7.61e-06 ***
burn01yes                 NA         NA      NA       NA    
burn02no            -0.72038    0.15623  -4.611 4.01e-06 ***
burn02yes                 NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for po

## More Model Fitting

In [14]:
out.noveg <- glm(
    seedlings ~ burn02 + burn01 + offset(log(totalseeds)),
    data = mydata, family = poisson)
summary(out.noveg)


Call:
glm(formula = seedlings ~ burn02 + burn01 + offset(log(totalseeds)), 
    family = poisson, data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2267  -1.5275  -1.0689   0.4755   5.7121  

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.63528    0.05793 -28.229  < 2e-16 ***
burn02no    -2.15896    0.10375 -20.810  < 2e-16 ***
burn02yes   -1.40519    0.18719  -7.507 6.06e-14 ***
burn01no    -0.65678    0.15258  -4.305 1.67e-05 ***
burn01yes         NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1268.0  on 176  degrees of freedom
Residual deviance:  573.4  on 173  degrees of freedom
AIC: 900.47

Number of Fisher Scoring iterations: 6


In [15]:
anova(out.noveg, out, test = "Chisq")

Unnamed: 0_level_0,Resid. Df,Resid. Dev,Df,Deviance,Pr(>Chi)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,173,573.4023,,,
2,170,501.1547,3.0,72.2476,1.408888e-15


**From the analysis of deviance test (also called, likelihood ratio test) we cannot
drop vegtype**