In [None]:
import pandas as pd
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## Problem Type

Poisson regression is the kind of regression we do when we want to estimate the effect that our explanatory variables have on the dependent variable, which is of type "count data". If we're trying to find a linear combination of the explanatory variables, then our Poisson regression is a subset of generalized linear models.

It's "Poisson" mainly because we use the Poisson distribution to model the likelihood of the dependent variable.

What we get out of this type of model is the relative contribution of each explanatory variable to the value of the dependent variable.

## Data structure

To use it with this model, the data should be structured as such:

- Each row is one measurement.
- The columns should be:
    - One column per explanatory variable.
        - Use ordinal data where possible; otherwise, strictly categorical data should be binarized.
    - One column for the dependent variable.

## Extensions to the model

None.

## Reporting summarized findings

Here are examples of how to summarize the findings.

> For every increase in $X_i$, we expect to see an increase in Y by `mean` (95% HPD: [`lower`, `upper`].

## Other notes

None.

In [None]:
df = pd.read_csv('datasets/ship-damage.txt')
# Log10 transform months
df['months'] = df['months'].apply(lambda x: np.log10(x))
df.head()

In [None]:
plt.scatter(x=df['months'], y=df['n_damages'])

In [None]:
with pm.Model() as model:
    betas = pm.Normal('betas', mu=0, sd=100**2, shape=(3, 1))    
    n_damages = betas[0] * df['yr_construction'] + betas[1] * df['period_op'] + betas[2] * df['months']
        
    n_damages_like = pm.Poisson('likelihood', mu=np.exp(n_damages), observed=df['n_damages'])
    trace = pm.sample(draws=2000)

In [None]:
pm.traceplot(trace)

In [None]:
pm.forestplot(trace, ylabels=['yr_construction', 'period_op', 'months'])

The best interpretation of this is that the log10 number of months that a boat has been used is the strongest positive contributor to the number of damages that a ship takes.

In [None]:
pm.summary(trace)