<a href="https://colab.research.google.com/github/DepartmentOfStatisticsPUE/cda-2022/blob/main/homeworks/hw1_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Exercise

Assume that $X$ follows Poisson distribution given by 

$$
P(X=x_i, \lambda_i) = \frac{\lambda_i^x e^{-\lambda_i}}{x_i!},
$$

where $\lambda_i = \theta_0 + \theta_1 \times z_i$, $\theta_0=0.5$, $\theta_1=0.5$, and $z_i \sim \text{Bern}(0.7)$ and number of observations is equal to $n=10,000$.

Tasks:

+ generate $z_i$,
+ generate $\lambda_i$ according to $\theta_0 + \theta_1 \times z_i$,
+ generate $\boldsymbol{X} \sim \text{Poisson}(\boldsymbol{\lambda})$
+ derive log-likelihood, gradient and hessian,
+ obtain MLE of $\boldsymbol{\theta} = (\theta_0, \theta_1)$ using Newton-Raphson method. 

## Derivation of ll, grad and hess

Log-likelihood 

$$
\log L(\theta_0, \theta_1; X_i, z_i) = -\lambda_i + x_i \log(\lambda_i) = -(\theta_0 + \theta_1z_i) + x_i \log(\theta_0 + \theta_1z_i)
$$

Gradient 

$$
\frac{\partial \log L_i}{\partial \mathbf{\theta}} = 
\begin{bmatrix}
\frac{\partial \log L_i}{\partial \theta_0} \\
\frac{\partial \log L_i}{\partial \theta_1}
\end{bmatrix} = 
\begin{bmatrix}
\frac{x_i}{\theta_0 + \theta_1 z_i} - 1\\
\frac{x_i z_i}{\theta_0 + \theta_1 z_i} - z_i
\end{bmatrix}
$$

Hessian

$$
\frac{\partial^2 \log L_i}{\partial \mathbf{\theta}^2} = 
\begin{bmatrix}
\frac{\partial^2 \log L_i}{\partial \theta_0^2} & \frac{\partial^2 \log L_i}{\partial \theta_1 \partial\theta_0} \\
\frac{\partial^2 \log L_i}{\partial \theta_0 \partial\theta_1} & \frac{\partial^2 \log L_i}{\partial \theta_1^2} \\
\end{bmatrix} = 
\begin{bmatrix}
\frac{-x_i}{(\theta_0+ \theta_1 z_i)^2} & \frac{-x_i z_i}{(\theta_0+ \theta_1 z_i)^2} \\
\frac{-x_i z_i}{(\theta_0+ \theta_1 z_i)^2} & \frac{-x_i z_i^2}{(\theta_0+ \theta_1 z_i)^2} \\
\end{bmatrix}
$$

Solution using R only

In [11]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [None]:
%%R
install.packages("maxLik")

In [17]:
%%R
library(maxLik)

Generate data

In [14]:
%%R
set.seed(123)
n <- 10000
z <- rbinom(n = n, prob = 0.7, size = 1)
theta_true <- c(0.5, 0.5)
lambda_true <- theta_true[1] + theta_true[2]*z ## lambda_i = 0.5 + 0.5*z_i
X <- rpois(n = n, lambda = lambda_true)
table(X)

X
   0    1    2    3    4    5    6    7 
4401 3453 1576  442  104   20    3    1 


Functions

In [15]:
%%R
ll <- function(theta, z, X) {
  
  lam <- theta[1]+theta[2]*z
  l <- X*log(lam) - lam
  return(sum(l))
}


ll_grad <- function(theta, z, X) {
  
  lam <- theta[1]+theta[2]*z

  l_g <- matrix(0, nrow = NROW(lam), ncol = 2)
  
  l_g[,1] <- X/lam - 1
  l_g[,2] <- X*z/lam - z

  return(colSums(l_g))
}

ll_hess <- function(theta, z, X) {
  
  lam <- theta[1]+theta[2]*z

  l_h <- matrix(0, nrow = 2, ncol = 2)

  l_h[1,1] <- sum(-X / lam^2)
  l_h[2,2] <- sum(-X * z^2 / lam^2)
  l_h[1,2] <- l_h[2,1] <- sum(-X * z/ lam^2)
  
  return(l_h)
}

In [None]:
%%R
solution <- maxLik(logLik = ll, grad =  ll_grad, hess = ll_hess, 
                   start = c(theta0 = 1, theta1 = 1), z = z, X = X, method = "NR")
summary(solution)

```r
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 4 iterations
Return code 1: gradient close to zero (gradtol)
Log-Likelihood: -9508.655 
2  free parameters
Estimates:
       Estimate Std. error t value Pr(> t)    
theta0  0.48171    0.01277   37.71  <2e-16 ***
theta1  0.51858    0.01747   29.69  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------

```