# **IEOR E4650  Business Analytics (Fall 2019)**

##**Lecture 9: Poisson Model with Observed Heterogeneity**

In this lecture, we discuss how to model heterogeneity for a count model. 

Learning objective:

* Understand the idea of uobserved heterogeneity
* Understand how to estimate a count model with observed heterogeneity
* Understand how to use the model for prediction



##Observed Heterogeneity
###(A model with covariates)

For models that have the unobserved heterogeneity elements in, we were relying on the distribution of the data itself to tell us how the distribution of $\lambda$ looks. 

There are many factors that could determine the $\lambda$ for each customer. For example, the value of $\lambda$ might be affected by the gender of the customer, the age of the customer.... If we have the information in the data, we can use it to model the variations in $\lambda$ directly. This is called modeling the observed heterogeneity. 

For this lecture, we will use a count data with covariates.

Setting of this study:

The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked whether or not they have a camper, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.



|Variable|Description|
|---|---|
|livebait|whether the visitor has live bait or not|
|child| how many children were in the party |
|persons| how many people were in the party|
|camper| whether this party camped|
|Nfish| The number of fish caught|


In [0]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
link="https://drive.google.com/open?id=1ytZ6QqLH-ES1YxxYbOGwMERVep0Tkhgu"
_,id=link.split("=")
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('myfile.csv')  
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as spst
from scipy.optimize import minimize
import scipy.special as spsp


Fishing = pd.read_csv('myfile.csv')
Fishing.head(10)


##Basic statistics

Let's use some basic statistics to check how different factors affect the number of fish caught individually

In [0]:
Fishing[["livebait","Nfish"]].groupby("livebait").mean()

In [0]:
Fishing[["child","Nfish"]].groupby("child").mean()

In [0]:
Fishing[["persons","Nfish"]].groupby("persons").mean()

In [0]:
Fishing[["camper","Nfish"]].groupby("camper").mean()

## Simple Poisson model with covarites.

Previously, we used $exp(\beta_0)$ to estimate $\lambda$.

If we have a model with covariates, we can simply use  $\lambda_i=exp(\beta_0+\beta_1x_{1i}+\beta_2x_{2i})$. In this case, every customer will have a different $\lambda$ depending on the values of the covariates.

After we construct the individualized $\lambda_i$, the rest of the work is the same.

Now, for customer $i$, the probability of getting $y_i$ counts is simply 

$$PMF_{poisson}(y_i|\lambda_i)=\frac{exp(-\lambda_i)\lambda_i^y}{y!}$$

Notice that what's differentiate a simple Poisson model with covariates from a simple Poisson distribution is that every customer will be using a different $\lambda_i$ when computing the individual likelihood.

After that, we will move on to compute the joint log likelihood like we did before. Instead of estimating $\beta_0$ only, we will estimate $\beta_0$, $\beta_1$, $\beta_2$.

In [0]:
x1=Fishing["livebait"]
x2=Fishing["child"]
x3=Fishing["persons"]
x4=Fishing["camper"]
y=Fishing["Nfish"]

def Neg_LL(betas):
  #include observed heterogeneity in lmbda
  lmbda= 
  Ind_L= 
  Ind_LL=np.log(Ind_L)
  return -np.sum(Ind_LL)

#import warnings
#warnings.simplefilter("ignore")

guess=np.random.rand(5)
model1=minimize(Neg_LL,guess, method="BFGS")

print(model1.fun)
print(model1.x)

Now, for every customer, we will have a personalized exposure.

In [0]:
betas=model1.x
#predict lambda for everyone. This is the expected number of exposures
predicted_lambda=
#add the predicted lambda to the data frame.
Fishing=Fishing.assign(predicted_lambda=)
Fishing.head(10)

### Incorporating the covariates in a Mixture model

Again, recall the individual likelihood function we had before. 

$PMF_{poisson}(y_i|\lambda_1) p+ PMF_{poisson}(y_i|\lambda_2) (1-p)$

$PMF_{poisson}(y_i|\lambda_1)$ is the PMF of a Poisson distribution with $\lambda_1$ at $y_i$. Similarly, $PMF_{poisson}(y_i|\lambda_2)$ is the PMF of a Poisson distribution with $\lambda_2$ at $y_i$. 



So here we have three options:

(1) Use the covarites to model $\lambda_1$ using $exp(\beta_0+\beta_1 x_1+\beta_2 x_2)$

(2) Use the covariates to model $\lambda_2$ using $exp(\beta_0+\beta_1 x_1+\beta_2 x_2)$

(3) Use the covariates to model p using using $\frac{exp(\beta_0+\beta_1 x_1+\beta_2 x_2)}{exp(\beta_0+\beta_1 x_1+\beta_2 x_2)+1}$

Of course, you can also incorperate the covariates in two of the parameters or all three of them. However, this will mean possible overfit. The model might fail to converge altogether.





In [0]:
#Let's estimate a zero-inflated Poisson model. We incorporate both the covariates in 
#lambda1 and p
def Neg_LL(betas):
  lmbda= 
  p= 
  Ind_L1=spst.poisson.pmf(y,lmbda)
  Ind_L2=spst.poisson.pmf(y,0)
  Ind_L=Ind_L1*p+Ind_L2*(1-p)
  Ind_LL=np.log(Ind_L)
  return -np.sum(Ind_LL)



In [0]:
guess=np.random.rand(10)
model2=minimize(Neg_LL,guess, method="BFGS")
 
print(model2.fun)
print(model2.x)

###Posterior Analysis

Because we have unoberved heterogeneity, we can again conduct posterior analysis. If we see a customer for one period, we can use the outcome to update the probability of this customer belonging to each segment, which can then help us update the expected number of exposures for this customer if we see this customer again.

##Incorporating the covariates in an NBD model

For a simple Poisson model, we assumed that 

$\lambda_i=exp(\beta_0+ \beta_1 x_{1i}+ \beta_2 x_{2i})=\lambda_0exp(\beta_1 x_{1i}+ \beta_2 x_{2i})$.

It might make sense to assume that people have different $\lambda_0$ values. Again, let's assume $\lambda_0 \sim  Gamma(\gamma, \alpha)$. 

$\lambda_0 \sim  Gamma(\gamma, \alpha)$
We will have 

$\lambda_i=\lambda_0exp(\beta_1 x_{1i}+ \beta_2 x_{2i})\sim Gamma (\gamma, \frac{\alpha}{exp(\beta_1x_1+\beta_2 x_2)})$. 

Thus, we will have the individual likelihood function for NBD with covariates equals to:

$\frac{\Gamma(\gamma+y)}{\Gamma(\gamma)\Gamma(y+1)}(\frac{exp(\beta_1 x_1 + \beta_2 x_2 )}{\alpha+exp(\beta_1 x_1 + \beta_2 x_2 )})^y(\frac{\alpha}{\alpha+exp(\beta_1 x_1 + \beta_2 x_2)})^\gamma$

For this distribution $E(x)=\frac{\gamma }{\alpha}exp(\beta_1 x_1 + \beta_2 x_2)$


In [0]:
def Neg_LL(betas):
  alpha= 
  exp_part=
  gamma= 
  ind_L= 
  #individual log likelihoo
  ind_LL=np.log(ind_L)
  #joint log likelihood
  Joint_LL=np.sum(ind_LL)
  return -Joint_LL  
  
 

In [0]:
guess=np.random.rand()
result=minimize(Neg_LL,guess,method="BFGS")
betas=result.x
 

print(betas)
print(result.fun)
lambda_predict_nbd=
Fishing=Fishing.assign(lambda_predict_nbd=lambda_predict_nbd


###Posterior Analysis

Previously, we have 

$P(y|\lambda) \sim Poisson$ and $\lambda\sim Gamma (\gamma,\alpha)$. We derived that the posterior distribution of $\lambda|y \sim Gamma(\gamma+y, \alpha+1)$.

When incorprating the covariates, we will have 

$P(y|\lambda) \sim Poisson$ and $\lambda\sim Gamma (\gamma,\frac{\alpha}{exp(\beta_1x_1+\beta_2x_2)})$. It is easy to see that the posterior distribution of $\lambda|y \sim Gamma(\gamma+y, \frac{\alpha}{exp(\beta_1x_1+\beta_2x_2)}+1)$.

Thus, the expected number of exposures for a customer conditional on observing $y$ is equal to 

$$\frac{\gamma+y}{\frac{\alpha}{exp(\beta_1x_1+\beta_2x_2)}+1}=exp(\beta_1x_1+\beta_2x_2)\frac{\gamma+y}{\alpha+exp(\beta_1x_1+\beta_2x_2)}$$


In [0]:
lambda_posterior=
Fishing.assign(lambda_posterior=lambda_posterior)