## GLM Script Containing First Principle Implementation of GLM ##

In this notebook I show the comparaison of results between glm written using first principles and base R implimentation of GLM in glm()

### GLM Linear Regression ###

In [1]:
glm_linear<-function(X,y)
{
  beta<-rep(0,ncol(X))
  H=t(X)%*%as.matrix(X)
  for(i in 1:29)
  {
    nabla<-(t(X)%*%as.matrix(X))%*%(beta)-t(X)%*%y
    beta=beta-(solve(H)%*%nabla)
  }
log_lik_1<-t(y-(as.matrix(X) %*% beta))%*%(y-(as.matrix(X) %*% beta))
sigma_2=log_lik_1/(nrow(X)-ncol(X))
log_lik_constants=-0.5*nrow(X)*log(2*pi)-nrow(X)*log(sqrt(sigma_2))
log_lik=log_lik_constants-(0.5*log_lik_1)/(sigma_2)
SE=diag(sqrt(abs(solve(H)*as.numeric(sigma_2))))
results<-list(Coefficients=beta,Log_Lik=log_lik,Std.Err=SE)
return(results)
}

In [3]:
##Reading data ##
setwd("/Users/gunnvantsaini/Documents/Ebooks/Programming and Statistical Packages/R/Data")
data=read.csv("HousePrices.csv")
data$ones=1
X=data[c("ones","SqFt","Bedrooms")]
y=data[,"Price"]

In [5]:
print(glm_linear(X,y))

$Coefficients
                [,1]
ones     -6367.59670
SqFt        49.49886
Bedrooms 12486.05780

$Log_Lik
         [,1]
[1,] -1454.63

$Std.Err
       ones        SqFt    Bedrooms 
17827.91479    10.11226  2947.13432 



In [6]:
mod=glm('Price~SqFt+Bedrooms',family ="gaussian" ,data=data)
summary(mod)


Call:
glm(formula = "Price~SqFt+Bedrooms", family = "gaussian", data = data)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-45993  -15402   -1446   11041   48352  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -6367.60   17827.91  -0.357    0.722    
SqFt           49.50      10.11   4.895 2.97e-06 ***
Bedrooms    12486.06    2947.13   4.237 4.36e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 445254299)

    Null deviance: 9.1685e+10  on 127  degrees of freedom
Residual deviance: 5.5657e+10  on 125  degrees of freedom
AIC: 2917.2

Number of Fisher Scoring iterations: 2


### GLM Logitic ###

In [7]:
glm_logistc<-function(X,y)
{
  beta<-rep(0,ncol(X))
  for(i in 1:30)
  {
    p=1/(1+exp(-as.matrix(X)%*%beta))
    nabla=t(as.matrix(X))%*%(p-y)
    B=diag(as.vector(p*(1-p)))
    H=t(as.matrix(X))%*%B%*%as.matrix(X)
    beta=beta-(solve(H)%*%nabla)
  }
  log_lik=y%*%log(p)+(1-y)%*%log(1-p)
  SE=diag(sqrt(abs(solve(H))))
  results=list(Coefficients=beta,Log_Lik=log_lik,Std.Err=SE)
  return(results)
}


In [8]:
data=read.csv("DeathPenalty.csv")
data$ones<-1
X=data[,c('ones','Agg','VRace')]
y=data[,'Death']


In [9]:
print(glm_logistc(X,y))

$Coefficients
           [,1]
ones  -6.675975
Agg    1.539661
VRace  1.810647

$Log_Lik
          [,1]
[1,] -56.73836

$Std.Err
     ones       Agg     VRace 
0.7574446 0.1867264 0.5361160 



In [10]:
mod=glm('Death~Agg+VRace',data=data,family = "binomial")
summary(mod)



Call:
glm(formula = "Death~Agg+VRace", family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7526  -0.2658  -0.1083  -0.1083   3.2069  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.6760     0.7574  -8.814  < 2e-16 ***
Agg           1.5397     0.1867   8.246  < 2e-16 ***
VRace         1.8106     0.5361   3.377 0.000732 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 321.88  on 361  degrees of freedom
Residual deviance: 113.48  on 359  degrees of freedom
AIC: 119.48

Number of Fisher Scoring iterations: 7


### GLM Poisson ###

In [11]:
glm_poisson<-function(X,y)
{
  beta<-rep(0,ncol(X))
  for(i in 1:30)
  {
    mean=exp(as.matrix(X)%*%beta)
    nabla=t(as.matrix(X))%*%(y-mean)
    B=diag(as.vector(mean))
    H=-1*t(as.matrix(X))%*%B%*%as.matrix(X)
    beta=beta-(solve(H)%*%as.matrix(nabla))
  }
  SE=diag(sqrt(abs(solve(H))))
  results=list(Coefficients=beta,Std.Err=SE)
  return(results)
}

In [12]:
data=read.csv("poisson_sim.csv")
data$ones<-1
X<-data[,c('ones','prog','math')]
y=data[,'num_awards']

In [13]:
print(glm_poisson(X,y)
)

$Coefficients
           [,1]
ones -5.5780569
prog  0.1232726
math  0.0861210

$Std.Err
       ones        prog        math 
0.676822577 0.163261060 0.009586059 



In [14]:
mod=glm('num_awards~prog+math',data=data,family = "poisson")
summary(mod)



Call:
glm(formula = "num_awards~prog+math", family = "poisson", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1840  -0.9003  -0.5891   0.3948   2.9539  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.578057   0.676823  -8.242   <2e-16 ***
prog         0.123273   0.163261   0.755     0.45    
math         0.086121   0.009586   8.984   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 287.67  on 199  degrees of freedom
Residual deviance: 203.45  on 197  degrees of freedom
AIC: 385.51

Number of Fisher Scoring iterations: 6


In [None]:
You can compare the estimated coefficients , standard errors and log likelihoods using my implimentation and the Base R implimentation of GLM s