# Intro

Goal: 

- trying out basic GLM on simulated data
- demonstrate what happens if there is a linear or log relationship with a feature

In [36]:
import numpy as np
import pandas as pd

from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_poisson_deviance

TODO: 
- it seems like the tol is the maximum of the gradient, but then what's the point...? that's very loose
- recalculate likelihoods, how does it work with a regularisation factor? 
- GLM from scratch? 

# Poisson 1

We have one continuous feature. The lambda parameter has a linear relationship with the feature. 

Mistakes you can make: 
- not applying a log transformation
- leaving alpha at default

In [2]:
# generate data
# we have one feature, feature_1, and Poisson lambda is going to be linear by it

df = pd.DataFrame.from_dict({'feature_1' : np.repeat(range(1000,10001),100)})
df['lambda'] = df['feature_1'] / 10000
df['claim_number'] = np.random.poisson(df['lambda'])
df['log_feature_1'] = np.log(df['feature_1'])

In [3]:
sum(df['lambda'])

495054.99999998964

In [4]:
sum(df['claim_number'])

495351

In [11]:
# In log-link GLM turns, what we want is so this: 
# predicted = 1/10000 * feature_1 ^ 1
# mdl_coef should be 1, while mdl_intercept should be log(1/10000)
np.log(1/10000)

-9.210340371976182

## Without Log Transformation

Let's see what happens if we try to put in the feature as it is. 

## With Default 1 Regularisation

In [25]:
mdl = PoissonRegressor()
X = df[['log_feature_1']]
y = df['claim_number']
mdl.fit(X, y)
df['pred_claim_number'] = mdl.predict(X)

In [26]:
df.sample(5)

Unnamed: 0,feature_1,lambda,claim_number,log_feature_1,pred_claim_number
753624,8536,0.8536,1,9.052048,0.590219
640748,7407,0.7407,0,8.910181,0.579935
668588,7685,0.7685,0,8.947026,0.582589
88205,1882,0.1882,0,7.54009,0.489386
528165,6281,0.6281,1,8.745284,0.568206


In [27]:
mdl.intercept_

-1.6488745873364938

In [28]:
mdl.coef_

array([0.12390718])

In [29]:
df[df.feature_1 == 10000][:1]

Unnamed: 0,feature_1,lambda,claim_number,log_feature_1,pred_claim_number
900000,10000,1.0,1,9.21034,0.60191


In [30]:
np.exp(mdl.intercept_ + np.log(10000) * mdl.coef_[0])

0.6019100525328023

In [31]:
# alternatively: 
np.exp(mdl.intercept_) * 10000 ** mdl.coef_[0]

0.6019100525328022

The reason of course is the regularisation parameter. Let's try to demonstrate why this is the maximum intercept - coef pair under regularisation parameter 1. 

$\alpha$ is supposed to be the L2 regularisation. 

Handy summary of the model objective function here: https://scikit-learn.org/stable/modules/linear_model.html#generalized-linear-models

In [24]:
def predict_lambda(log_feature_1, intercept, coef):
    return np.exp(intercept + log_feature_1 * coef)

In [33]:
my_pred = predict_lambda(df['log_feature_1'], mdl.intercept_, mdl.coef_)

df['my_pred'] = my_pred

In [35]:
df.sample(3)

Unnamed: 0,feature_1,lambda,claim_number,log_feature_1,pred_claim_number,my_pred
567423,6674,0.6674,0,8.805975,0.572495,0.572495
195288,2952,0.2952,0,7.990238,0.517458,0.517458
776553,8765,0.8765,0,9.078522,0.592159,0.592159


In [37]:
def calculate_model_objective(actu, log_feature_1, intercept, coef, alpha):
    my_pred = predict_lambda(log_feature_1, intercept, coef)
    return (1/2 * mean_poisson_deviance(actu, my_pred)) + (alpha/2 * (coef ** 2))

In [38]:
calculate_model_objective(df['claim_number'], df['log_feature_1'], mdl.intercept_, mdl.coef_, 1)

array([0.54145969])

In [44]:
curr_min_objective_value = 999
optimal_intercept = None
optimal_coef = None

for intercept in np.arange(-1.75, -1.55, 0.01):
    for coef in np.arange(0.03, 0.23, 0.01):
        curr_objective_value = calculate_model_objective(
            df['claim_number'], df['log_feature_1'], intercept, coef, 1)
        if(curr_objective_value < curr_min_objective_value):
            curr_min_objective_value = curr_objective_value
            optimal_intercept = intercept
            optimal_coef = coef

In [46]:
round(optimal_intercept,2)

-1.62

In [47]:
round(optimal_coef,2)

0.12

In [48]:
calculate_model_objective(df['claim_number'], df['log_feature_1'], mdl.intercept_, mdl.coef_, 1)

array([0.54145969])

In [49]:
calculate_model_objective(df['claim_number'], df['log_feature_1'], optimal_intercept, optimal_coef, 1)

0.5414739455585038

In [50]:
calculate_model_objective(df['claim_number'], df['log_feature_1'], np.log(1/10000), 1, 1)

0.9842972255248361

## With No Regularisation

If we take out regularisation in this simple example, we expect to get exactly the parameters used in simulation. 

In [18]:
mdl = PoissonRegressor(alpha = 0)
X = df[['log_feature_1']]
y = df['claim_number']
mdl.fit(X, y)
df['pred_claim_number'] = mdl.predict(X)

In [19]:
mdl.intercept_

-9.175760280526253

In [20]:
mdl.coef_

array([0.99610799])

In [23]:
df[df.feature_1 == 1000][:1]

Unnamed: 0,feature_1,lambda,claim_number,log_feature_1,pred_claim_number
0,1000,0.1,1,6.907755,0.100772


In [21]:
df[df.feature_1 == 10000][:1]

Unnamed: 0,feature_1,lambda,claim_number,log_feature_1,pred_claim_number
900000,10000,1.0,1,9.21034,0.998734


# Double-Log

What happens if the real-life relationship is already logarithmic between the feature and the lambda parameter. 