# Poisson Regression

Poisson regression is also a special case of the generalized linear model, where the random component is specified by the Poisson distribution.<br><br>
**Sections**
- [1.0 Synthetic Data & Model](#1.0-Synthetic-Data-&-Model)
- [2.0 Newton Raphson Algorithm](#2.0-Newton-Raphson-Algorithm)
- [3.0 NR Implementation](#3.0-Newton-Raphson-Implementation)
    - [3.1 Checking Convergence](#3.1-Checking-Convergence)
- [4.0 Variance of Estimators](#4.0-Variance-of-Estimators)
    - [4.1 Covariance of $L'(\theta)$](#4.1-Covariance-of-$L'(\theta)$)
    - [4.2 Covariance of $\hat\theta$](#4.2-Covariance-of-$\hat\theta$)

### 0. Importing Modules

In [2]:
import math
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import bokeh
from bokeh.plotting import figure, show
from bokeh.models import tickers, ranges
from bokeh.io import output_notebook
output_notebook()

### 1.0 Synthetic Data & Model

$\hspace{80 mm}$ Poisson distribution:

\begin{align}
\large
P(Y = y) = \dfrac{\lambda^{y}}{y}{\rm e}^{-\lambda}
\end{align}

$\hspace{80 mm}$ Will model lambda as a regression function:

\begin{align}
\large
\lambda  = a + bx \\
\end{align}

$\hspace{80 mm}$ However, since we need $ a + bx > 0$, it is more usual to take:

\begin{equation}
\large
\lambda  = {\rm e}^{a + bx} \\
\\
\large
Log \hspace{2 mm} likelihood = L(a,b) = a\hspace{1mm} n \hspace{1mm}\bar y + b\hspace{1mm} z - {\rm e}^{a}\Sigma_{i=1}^{n}{\rm e}^{bx_i}\\
\end{equation}

Steps: <br>
- Model lambda as a regression function
- Sample 100 values from $ X_{i} \sim N(0,1)$
- Get 100 values from  $ \lambda_{i}  = a + bx_{i} $
- Sample 1 value from $ Y_{i} \sim Pois(\lambda_{i})$ for **each i** (i.e., $ \lambda_{i} $)

In [3]:
n = 100
a = 0.5
b = -1
x_i = np.random.normal(0, 1, n)
lambda_vals = np.exp(a + b*x_i)      # shape = 100
y_i = np.random.poisson(lambda_vals) # If size is None (default), a single value is returned if lam is a scalar. 
                                     #  Otherwise, np.array(lam).size samples are drawn. 

# Plotting
p = figure(toolbar_location= None, outline_line_color = 'black')
p.plot_width = 350
p.plot_height = 350
p.scatter(x = x_i, y = y_i, size=8, line_width = 1, line_color = 'black', fill_color = 'firebrick', legend_label="Data")
p.axis.axis_label = 'x'
p.yaxis.axis_label = 'y'
p.legend.border_line_color = "black"
p.legend.border_line_alpha = 1
p.legend.label_text_color = 'black'
show(p)    

### 2.0 Newton Raphson Algorithm

Estimating a and b via maximum likelihood. Helper function to get $J(\theta)$ matrix and $L'(\theta)$ vector

$$
J(a,b)=\begin{pmatrix}
    \dfrac{\partial^2 L}{\partial a^2}   & \dfrac{\partial^2 L}{\partial a \hspace{1mm} \partial b}\\
    \dfrac{\partial^2 L}{\partial a \hspace{1mm} \partial b}   & \dfrac{\partial^2 L}{\partial b^2}
\end{pmatrix} \\
\\
\text{}\\
L'(a,b) = (\dfrac{\partial L}{\partial a}, \dfrac{\partial L}{\partial b})
$$

In [4]:
def get_J_and_Lprime(x_arr, y_arr, a, b, n):
    """
    Computes J matrix and L vector for a Poisson Distribution
    where lambda is model as:
    
    \begin{equation}
    \lambda  = {\rm e}^{a + bx}
    \end{equation}       
    """
    z = np.sum(x_arr*y_arr)

    # Get derivatives for NR
    dL_da = n*np.mean(y_arr) - np.exp(a)*np.sum(np.exp(b*x_arr))
    dL_db = z - np.exp(a)*np.sum(x_arr*np.exp(b*x_arr))

    # Second Partial Derivatives
    dL_da2 = - np.exp(a)*np.sum(np.exp(b*x_arr))
    dL_db2 = - np.exp(a)*np.sum((x_arr**2)*np.exp(b*x_arr))
    dL_dadb = - np.exp(a)*np.sum(x_arr*np.exp(b*x_arr))
    
    J = np.array([[dL_da2,  dL_dadb],
              [dL_dadb, dL_db2]])
    
    L_prime = np.array([[dL_da],[dL_db]])
    return J, L_prime

Compute $J(\theta)$ matrix and $L'(\theta)$ vector for true parameters

In [5]:
J, L_prime = get_J_and_Lprime(x_i, y_i, a, b, n)
print("J matrix:\n", J)
print("\nL_prime vector:\n", L_prime)

J matrix:
 [[-215.32478629  153.3362663 ]
 [ 153.3362663  -304.18248339]]

L_prime vector:
 [[-2.32478629]
 [21.8208365 ]]


Algorithm function

In [6]:
def newton_n_iter(x, y, a_o, b_o,  tolerance = 0.00001, output_message = False):
    """
    Performs Newton-Raphson for a definite number of iteration
    Args:
        guess (float): initial value for parameter
        tolerance (float): tolerance
    
    """
    #Initialize
    a = [a_o]
    b = [b_o]
    difference = tolerance * 5 # Enter Loop
    iter_number = 0
    status_message = 'Starting with Guess = ' + str(a_o) + "," + str(b_o) + '\n'

    while abs(difference) > tolerance:
    
        J, L = get_J_and_Lprime(x_i, y_i, a_o, b_o, n)
   
        a_1 ,b_1 = np.array([[a_o],[b_o]]) - np.linalg.inv(J) @ L
        a.append(a_1[0])
        b.append(b_1[0])
        
        # calculate difference and update iteration state
        difference = max(a_1[0] - a_o, b_1[0] - b_o)
        a_o, b_o = a_1[0], b_1[0]
    
        iter_number += 1
        status_message += 'Iteration #' + str(iter_number) + ':= ' + str(a_1[0]) + "," + str(b_1[0])+ '\n'
        
    status_message += 'Total No. of Iterations = '  +  str(iter_number)
    
    if output_message:
        return a_o, b_o, status_message, a, b
    return a_o, b_o, a, b

### 3.0 Newton Raphson Implementation

In [7]:
(a_hat, b_hat, status_message, a_array, b_array) = newton_n_iter(x_i, y_i, 1.6, 2, output_message = True)
print(status_message)

Starting with Guess = 1.6,2
Iteration #1:= 1.2002971139518022,1.6266636452190024
Iteration #2:= 1.2536990210486982,0.9197723776803928
Iteration #3:= 1.3392836031522943,0.0471264174807251
Iteration #4:= 1.004248197122873,-0.5329314908129316
Iteration #5:= 0.6778715384899304,-0.8124224950672942
Iteration #6:= 0.5671260576587114,-0.8901302818067861
Iteration #7:= 0.5584578502236155,-0.8957823142371337
Iteration #8:= 0.5584112467728466,-0.8958121954283318
Iteration #9:= 0.5584112454447829,-0.8958121962788408
Total No. of Iterations = 9


#### 3.1 Checking Convergence

In [8]:
# Check that the derivatives of the parameters in last iteration are about zero
J, L_prime = get_J_and_Lprime(x_i, y_i, a_hat, b_hat, n)
print("\nL vector:\n", L_prime)


L vector:
 [[0.]
 [0.]]


In [9]:
# Plotting
p = figure(toolbar_location= None, outline_line_color = 'black')
p.plot_width = 350
p.plot_height = 350
p.line(x = range(len(a_array)), y = a_array, line_width = 2, line_color = 'firebrick', legend_label="a")
p.line(x = range(len(a_array)), y = b_array, line_width = 2, line_color = 'green', legend_label="b")
p.axis.axis_label = 'Iteration #'
p.yaxis.axis_label = 'a and b'
p.legend.border_line_color = "black"
p.legend.border_line_alpha = 1
p.legend.label_text_color = 'black'
p.legend.location = 'top_right'
show(p)  

### 4.0 Variance of Estimators

In [10]:
print(a_hat)
print(b_hat)

0.5584112454447829
-0.8958121962788408


#### 4.1 Covariance of $L'(\theta)$

$$
\large
Cov L'(a,b)=\begin{pmatrix}
    Var(\dfrac{\partial L}{\partial a})   & Cov(\dfrac{\partial L}{\partial a}, \dfrac{\partial L}{\partial b})\\
    Cov(\dfrac{\partial L}{\partial a}, \dfrac{\partial L}{\partial b})   & Var(\dfrac{\partial L}{\partial b})\\
    \end{pmatrix}  = \Omega 
     =\begin{pmatrix}     
     \Sigma_{i=1}^{n}\lambda_i   & \Sigma_{i=1}^{n}x_i\lambda_i\\
     \Sigma_{i=1}^{n}x_i\lambda_i   & \Sigma_{i=1}^{n}x_i^2\lambda_i 
\end{pmatrix} = \Omega
\\
\large
$$

In [11]:
def get_COV_L_prime(x_arr, y_arr, a, b, n):
    """
    Computes the covariance matrix of L'   
    """
    lambda_i =  np.exp(x_arr + b*x_arr)
    # Get derivatives for NR
    dL_da = np.sum(lambda_i)
    dL_db = np.sum((x_arr**2) * lambda_i)
    dL_dadb = np.sum(x_arr * lambda_i)
    
    COV_L = np.array([[dL_da,  dL_dadb],
                      [dL_dadb, dL_db]])
    
    return COV_L

COV_L_prime = get_COV_L_prime(x_i, y_i, a_hat, b_hat, n)
COV_L_prime

array([[102.06061146,  24.06393475],
       [ 24.06393475,  83.36284603]])

#### 4.2 Covariance of $\hat\theta$

$$
\large
Cov\hspace{1mm}\hat\theta = J^{-1}(\theta)\hspace{2mm}CovL'(\theta)\hspace{2mm}J^{-1}(\theta)
\\
\large
Cov \hat\theta = \begin{pmatrix}
    Var(a)   & Cov(a,b)\\
    Cov(a,b)  & Var(b)\\
    \end{pmatrix}
$$

In [12]:
COV_theta_hat = np.linalg.inv(J) @ COV_L_prime @ np.linalg.inv(J)
COV_theta_hat

array([[0.00639202, 0.00460148],
       [0.00460148, 0.00403912]])