# Poisson Regression Demonstration In Python

### Using Complaints Data


## Background 
A company has recently launched a loyalty program under which they collected information about their customers. The next leg of the program aims to add 20,000 customers to their loyalty pool. The company wants to understand if the number of complaints by a customer can be modeled in order to set up a call centre with optimum strength.

- Sample size is 113

- Information is available about Region, Loyalty Tier, Complaints and Customer’s Association with the Company

## Objective 

To model number of complaints to prepare a road map for the call centre in the next leg of loyalty program 



## Data Description

| Columns | Description | Type |
|----------|--------------|------|
| custid | Customer ID | character |
| region | Region to which the customer belongs | categorical |
| tier | Loyalty program tier of the customer | categorical |
| age | Representing customer’s association with the company | categorical |
| ncomp | Number of complaints | Integer (count) |


### Import Library

In [1]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import scipy.stats as stats


  from pandas.core import (


### Load Data

In [2]:
df = pd.read_csv('Complaints.csv')
df.head()

Unnamed: 0,custid,region,tier,age,ncomp
0,1,N,platinum,less2,0
1,2,W,gold,more2,3
2,3,W,silver,less2,9
3,4,S,silver,less2,6
4,5,E,silver,less2,7


### Build Poisson Regression Model

In [3]:
# Fit Poisson regression using GLM
glm_poisson = smf.glm(formula='ncomp ~ region + tier + age', 
                      data=df, 
                      family=sm.families.Poisson()).fit()

# Display the summary
print(glm_poisson.summary())

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  ncomp   No. Observations:                  113
Model:                            GLM   Df Residuals:                      106
Model Family:                 Poisson   Df Model:                            6
Link Function:                    Log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -221.43
Date:                Tue, 14 Oct 2025   Deviance:                       116.01
Time:                        14:42:48   Pearson chi2:                     107.
No. Iterations:                     4   Pseudo R-squ. (CS):             0.4621
Covariance Type:            nonrobust                                         
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept            1.2919      0.139  

### Prediction

In [4]:
df['pred'] = glm_poisson.predict()
df.head()

Unnamed: 0,custid,region,tier,age,ncomp,pred
0,1,N,platinum,less2,0,1.638791
1,2,W,gold,more2,3,2.769746
2,3,W,silver,less2,9,3.607401
3,4,S,silver,less2,6,4.500642
4,5,E,silver,less2,7,5.656583


### Goodness Of Fit

In [5]:

res_deviance = glm_poisson.deviance
df = glm_poisson.df_resid

# Compute p-value
pvalue = 1 - stats.chi2.cdf(res_deviance, df)
print(pvalue)

0.23801544910011962
