### Poisson Regression 

This example provides a complete illustration of Poisson Regression using data from a Kaggle competition. https://www.kaggle.com/code/hongpeiyi/poisson-regression-with-statsmodels/notebook

In [1]:
# importing the libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler  # For standardizatio

In [2]:
# load the data

data_path = "data_poisson.csv"
data = pd.read_csv(data_path, index_col="id")

# identify input and output

X = data.drop("loss", axis=1)
y = data.loss
feature_names = X.columns.to_list()

# devide data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# standarlise
standizer = StandardScaler()
X_train_S = standizer.fit_transform(X_train)
X_test_S = standizer.transform(X_test)

# Initialize and train the Poisson Regression model
poisson_model = PoissonRegressor()
poisson_model.fit(X_train_S, y_train)

# Make predictions on the test set
y_pred = poisson_model.predict(X_test_S)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
print(f"R-squared (R2): {r2:.2f}")



Mean Squared Error (MSE): 62.66
Mean Absolute Error (MAE): 6.16
Root Mean Squared Error (RMSE): 7.92
R-squared (R2): -0.00


## Analysis of the baseline-model (you'll need to interpret these values based on your data)
Analysis of the first run:
MSE, MAE, RMSE give an idea of the prediction error. Lower values are better.
R-squared explains the proportion of variance in the output that's predictable from the input. Closer to 1 is better.
Consider if the R-squared is reasonable for your data. A low R-squared might indicate a poor fit or that the linear assumptions of the model don't fully hold for count data.


In [3]:
# --- Second experiemnt: Poisson Regression after PCA ---
print("\n--- Poisson Regression after PCA (10 components) ---")

# 4. Apply PCA to reduce dimensions to 10
pca = PCA(n_components=10)
X_train_pca = pca.fit_transform(X_train_S)
X_test_pca = pca.transform(X_test_S)

# Initialize and train Poisson Regression on PCA-transformed data
poisson_model_pca = PoissonRegressor()
poisson_model_pca.fit(X_train_pca, y_train)

# Make predictions on the PCA-transformed test set
y_pred_pca = poisson_model_pca.predict(X_test_pca)

# Evaluate the model after PCA
mse_pca = mean_squared_error(y_test, y_pred_pca)
mae_pca = mean_absolute_error(y_test, y_pred_pca)
rmse_pca = np.sqrt(mse_pca)
r2_pca = r2_score(y_test, y_pred_pca)

print(f"Mean Squared Error (MSE) after PCA: {mse_pca:.2f}")
print(f"Mean Absolute Error (MAE) after PCA: {mae_pca:.2f}")
print(f"Root Mean Squared Error (RMSE) after PCA: {rmse_pca:.2f}")
print(f"R-squared (R2) after PCA: {r2_pca:.2f}")



--- Poisson Regression after PCA (10 components) ---
Mean Squared Error (MSE) after PCA: 62.37
Mean Absolute Error (MAE) after PCA: 6.14
Root Mean Squared Error (RMSE) after PCA: 7.90
R-squared (R2) after PCA: 0.00


### Note that we used the standarised version of data for before applying PCA, why?

PCA is like trying to find the most important "directions" or "axes" in your data. If some features have much larger values than others, PCA might get tricked into thinking those features are the most important, just because of their large numbers, not because they actually tell you more about the overall structure of the data.

Standardization is like converting all your measurements to the same "standard unit." For example, instead of grams and teaspoons, you could think about the ingredient's value relative to its average and how spread out it is. This way, PCA focuses on the real variance and relationships in the data, not just the differences in the scales of the original measurements.

## Comparison of Poisson Regression Performance: Baseline vs. PCA (10 Components)

Here's a comparison of the Poisson Regression model's performance before applying PCA (Baseline) and after applying PCA with 10 components:

| Metric                 | Baseline (Original Features) | PCA (10 Components) |
| ---------------------- | ---------------------------- | --------------------- |
| Mean Squared Error (MSE) | 62.66                        | 62.08                 |
| Mean Absolute Error (MAE) | 6.16                         | 6.12                  |
| Root Mean Squared Error (RMSE) | 7.92                         | 7.88                  |
| R-squared (R2)         | -0.00                        | 0.00                  |

**Analysis of the Metrics:**

* **Mean Squared Error (MSE):** This measures the average of the squared differences between the predicted and actual values. A lower MSE indicates better performance.
    * The MSE decreased slightly from 62.66 in the baseline to 62.08 after PCA. This suggests a marginal improvement in the overall squared error.

* **Mean Absolute Error (MAE):** This measures the average of the absolute differences between the predicted and actual values. It's easier to interpret than MSE as it's in the same units as the target variable.
    * The MAE also decreased slightly from 6.16 to 6.12 after PCA, indicating a small reduction in the average absolute error.

* **Root Mean Squared Error (RMSE):** This is the square root of the MSE. It provides a measure of the typical magnitude of the errors and is also in the same units as the target variable.
    * The RMSE decreased slightly from 7.92 to 7.88 after PCA, suggesting a minor reduction in the typical error magnitude.

* **R-squared (R2):** This metric represents the proportion of the variance in the dependent variable (the target, 'loss' in this case) that is predictable from the independent variables (the features).
    * **Baseline R-squared: -0.00**
        * An R-squared of 0 indicates that the model does not explain any of the variability of the response data around its mean. In other words, the model is no better than simply predicting the average value of the target variable for all predictions.
        * A negative R-squared, while rare, indicates that the model fits the data worse than a horizontal line at the mean of the dependent variable. This suggests that the model is making very poor predictions.
    * **PCA (10 Components) R-squared: 0.00**
        * The R-squared increased slightly to 0.00 after applying PCA. While this is an improvement over a negative value, it still signifies that the model explains none of the variance in the target variable. The model is essentially performing no better than a simple average prediction.

**Overall Judgment:**

Applying PCA with 10 components has resulted in a very slight improvement in the error metrics (MSE, MAE, RMSE). However, the R-squared value remains at or very close to zero.

**What does this mean, especially the R-squared?**

The R-squared values are the most concerning aspect of these results. An R-squared of 0 (or a negative value) indicates that the Poisson Regression model, both with the original features and after PCA with 10 components, is **not effectively capturing the relationship between the input features and the target variable ('loss').**

* **The model is not explaining any of the reasons why the 'loss' values vary.** It's essentially making predictions that are no better than just guessing the average 'loss' value.
* **The features, even after reducing dimensionality with PCA, are not strongly linearly related to the (log of the expected) 'loss' in a way that the Poisson Regression model can learn effectively.**

**In conclusion:** While PCA with 10 components has slightly reduced the prediction errors, it hasn't addressed the fundamental issue of the model's inability to explain the variance in the target variable. The model is still performing very poorly, as indicated by the near-zero R-squared. Your students should continue to explore other potential issues like the appropriateness of the Poisson Regression model for this data, the quality of the features, or the need for feature engineering.

In [4]:

# --- Poisson Regression with Regularization ---
print("\n--- Poisson Regression with L2 Regularization ---")

# Initialize and train the Poisson Regression model with L2 regularization
# alpha controls the strength of the regularization (higher alpha means stronger)
poisson_model_reg_l2 = PoissonRegressor(alpha=1.0)
poisson_model_reg_l2.fit(X_train_S, y_train)

# Make predictions on the test set
y_pred_reg_l2 = poisson_model_reg_l2.predict(X_test_S)

# Evaluate the model with L2 regularization
mse_reg_l2 = mean_squared_error(y_test, y_pred_reg_l2)
mae_reg_l2 = mean_absolute_error(y_test, y_pred_reg_l2)
rmse_reg_l2 = np.sqrt(mse_reg_l2)
r2_reg_l2 = r2_score(y_test, y_pred_reg_l2)

print(f"Mean Squared Error (MSE) with L2: {mse_reg_l2:.2f}")
print(f"Mean Absolute Error (MAE) with L2: {mae_reg_l2:.2f}")
print(f"Root Mean Squared Error (RMSE) with L2: {rmse_reg_l2:.2f}")
print(f"R-squared (R2) with L2: {r2_reg_l2:.2f}")




--- Poisson Regression with L2 Regularization ---
Mean Squared Error (MSE) with L2: 62.66
Mean Absolute Error (MAE) with L2: 6.16
Root Mean Squared Error (RMSE) with L2: 7.92
R-squared (R2) with L2: -0.00


## Why Regularization Makes Sense for Linear Models (Despite Their Simplicity)

It's understandable why you might think regularization is only for more complex models. Linear models like Linear Regression and Poisson Regression seem straightforward, so why would we need to "regularize" them?

Here's the breakdown:

**1. Preventing Overfitting, Even in Linear Models:**

* **What is Overfitting?** Overfitting happens when a model learns the training data *too well*, including the noise and random fluctuations. This leads to excellent performance on the training data but poor performance on new, unseen data (like the test set).
* **Can Linear Models Overfit?** Yes, especially when:
    * **You have a large number of features compared to the number of data points.** In such cases, the linear model can become very flexible and essentially memorize the training data.
    * **There's multicollinearity (high correlation between features).** This can cause the model to assign very large and unstable coefficients to the correlated features.
* **How Regularization Helps:** Regularization adds a penalty to the model's complexity, specifically by penalizing large coefficient values. This forces the model to find a simpler solution that generalizes better to new data.

**2. Improving Generalization:**

* The primary goal of any machine learning model is to generalize well to unseen data. Regularization is a powerful tool to achieve this.
* By shrinking or setting coefficients to zero, regularization makes the model less sensitive to the specific patterns in the training data, including noise. This often leads to better performance on the test set.

**3. Addressing Multicollinearity:**

* As mentioned earlier, multicollinearity can lead to unstable and hard-to-interpret coefficients in linear models.
* **L2 Regularization (Ridge Regression):** Shrinks the coefficients of correlated features towards each other, reducing their individual impact and stabilizing the model.
* **L1 Regularization (Lasso Regression):** Can drive the coefficients of less important correlated features to exactly zero, effectively performing feature selection and simplifying the model.

**4. Feature Selection (Especially with L1):**

* Even if you don't have a huge number of features, L1 regularization can be useful for identifying the most important features by setting the coefficients of less relevant features to zero. This can lead to a more interpretable and potentially more efficient model.

**Think of it like this:**

Imagine you're trying to fit a straight line through some data points.

* **Without Regularization:** If you have a lot of data points and maybe some outliers, the line might try to get very close to every single point, even the outliers. This can result in a wiggly line that doesn't represent the overall trend well.
* **With Regularization:** Regularization forces the line to be smoother and less influenced by individual noisy points. It finds a more general trend that is likely to work better on new data.

**In the context of Poisson Regression:**

Poisson Regression models the relationship between features and the log of the expected count. Even with this specific form, it's still a linear model in terms of the coefficients. Regularization can help prevent overfitting if you have many features or multicollinearity in your count data modeling.

**Key Takeaway:**

Regularization isn't just for complex models like neural networks or decision trees. It's a valuable technique for linear models to improve their generalization ability, handle multicollinearity, and potentially perform feature selection. It's about finding the right balance between fitting the training data well and creating a model that performs reliably on new data.

So, while linear models are simpler than some others, they can still benefit significantly from the application of regularization techniques like L1 and L2.

In [5]:
# --- Poisson Regression with Regularization and parameter tuning ---
print("\n--- Poisson Regression with L2 Regularization ---")

# Initialize and train the Poisson Regression model with L2 regularization
# alpha controls the strength of the regularization (higher alpha means stronger)
poisson_model_reg_l2 = PoissonRegressor(alpha=10)
poisson_model_reg_l2.fit(X_train_S, y_train)

# Make predictions on the test set
y_pred_reg_l2 = poisson_model_reg_l2.predict(X_test_S)

# Evaluate the model with L2 regularization
mse_reg_l2 = mean_squared_error(y_test, y_pred_reg_l2)
mae_reg_l2 = mean_absolute_error(y_test, y_pred_reg_l2)
rmse_reg_l2 = np.sqrt(mse_reg_l2)
r2_reg_l2 = r2_score(y_test, y_pred_reg_l2)

print(f"Mean Squared Error (MSE) with L2: {mse_reg_l2:.2f}")
print(f"Mean Absolute Error (MAE) with L2: {mae_reg_l2:.2f}")
print(f"Root Mean Squared Error (RMSE) with L2: {rmse_reg_l2:.2f}")
print(f"R-squared (R2) with L2: {r2_reg_l2:.2f}")



--- Poisson Regression with L2 Regularization ---
Mean Squared Error (MSE) with L2: 62.11
Mean Absolute Error (MAE) with L2: 6.13
Root Mean Squared Error (RMSE) with L2: 7.88
R-squared (R2) with L2: 0.00


### As the result indicates the regularisation had a positive impact on the output

In [6]:
# Apply PCA to reduce dimensions to 10, then include regularisation
pca = PCA(n_components=10)
X_train_pca = pca.fit_transform(X_train_S)
X_test_pca = pca.transform(X_test_S)

# Initialize and train Poisson Regression on PCA-transformed data with L2 regularization
poisson_model_pca_reg = PoissonRegressor(alpha=0.001)  # Adding L2 regularization with alpha=1.0
poisson_model_pca_reg.fit(X_train_pca, y_train)

# Make predictions on the PCA-transformed test set
y_pred_pca_reg = poisson_model_pca_reg.predict(X_test_pca)

# Evaluate the model after PCA with L2 regularization
mse_pca_reg = mean_squared_error(y_test, y_pred_pca_reg)
mae_pca_reg = mean_absolute_error(y_test, y_pred_pca_reg)
rmse_pca_reg = np.sqrt(mse_pca_reg)
r2_pca_reg = r2_score(y_test, y_pred_pca_reg)

print(f"Mean Squared Error (MSE) after PCA with L2: {mse_pca_reg:.2f}")
print(f"Mean Absolute Error (MAE) after PCA with L2: {mae_pca_reg:.2f}")
print(f"Root Mean Squared Error (RMSE) after PCA with L2: {rmse_pca_reg:.2f}")
print(f"R-squared (R2) after PCA with L2: {r2_pca_reg:.2f}")

Mean Squared Error (MSE) after PCA with L2: 62.63
Mean Absolute Error (MAE) after PCA with L2: 6.14
Root Mean Squared Error (RMSE) after PCA with L2: 7.91
R-squared (R2) after PCA with L2: -0.00


## Why Regularization Hurts After PCA: Summary for Students

It's interesting that regularization helps your model with the original data but makes it worse after applying PCA! Here's why this might be happening:

**Think of it like this:**

* **Original Data:** Imagine you have a messy room with lots of stuff, some useful, some junk. Regularization is like tidying up, getting rid of the junk (less important features) and organizing the useful stuff.
* **PCA:** PCA is like taking a picture of the room that captures the most important things in a simplified way. You lose some details, but you see the main structure.

**Why Regularization Might Hurt *After* PCA:**

1.  **PCA Already Simplified:** PCA already reduced the number of "things" (features) and kept what it thought was most important. Trying to "tidy up" *again* after PCA might be like throwing away something important that PCA decided to keep in the picture.
2.  **PCA Changes the "Things":** The "things" in the picture (principal components) are combinations of the original items. Regularization might not understand these combinations and could accidentally remove important parts.
3.  **Too Much Cleaning:** If you strongly regularize after PCA, you might be cleaning *too much*, making the model too simple and missing important patterns.

**In Simple Words:**

* Regularization on the original data helps manage messy features.
* PCA already tries to make the features cleaner and fewer.
* Applying regularization *after* PCA can be like overdoing it and removing important information that PCA kept.

**What to Do:**

* **Experiment:** Try using less regularization (smaller `alpha`) after PCA.
* **Think Carefully:** PCA and regularization both try to simplify the model. You might not need to do both strongly.
* **Look at Results:** Always check your scores (like R-squared) to see what works best.

**Key Idea:** PCA and regularization can sometimes overlap in what they do. If PCA has already simplified the data, adding too much regularization might be unnecessary or even harmful.

In [12]:
# Polynomial poisson Regression

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.decomposition import PCA
import numpy as np

# Load the data
data_path = "data_poisson.csv"
data = pd.read_csv(data_path, index_col="id")

# Identify input and output
X = data.drop("loss", axis=1)
y = data.loss
feature_names = X.columns.to_list()

# Divide data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- 1. Polynomial Poisson Regression (Degree 2) - Increased max_iter ---
print("\n--- Polynomial Poisson Regression (Degree 2) - Increased max_iter ---")

# Create polynomial features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Scale the polynomial features
scaler_poly = StandardScaler()
X_train_poly_scaled = scaler_poly.fit_transform(X_train_poly)
X_test_poly_scaled = scaler_poly.transform(X_test_poly)

# Initialize and train Poisson Regression with increased max_iter
poisson_model_poly = PoissonRegressor(max_iter=500)
poisson_model_poly.fit(X_train_poly_scaled, y_train)

# Make predictions
y_pred_poly = poisson_model_poly.predict(X_test_poly_scaled)

# Evaluate
mse_poly = mean_squared_error(y_test, y_pred_poly)
mae_poly = mean_absolute_error(y_test, y_pred_poly)
rmse_poly = np.sqrt(mse_poly)
r2_poly = r2_score(y_test, y_pred_poly)

print(f"Mean Squared Error (MSE): {mse_poly:.2f}")
print(f"Mean Absolute Error (MAE): {mae_poly:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse_poly:.2f}")
print(f"R-squared (R2): {r2_poly:.2f}")

# --- 2. Polynomial Poisson Regression (Degree 2) with L2 Regularization - Increased max_iter ---
print("\n--- Polynomial Poisson Regression (Degree 2) with L2 Regularization - Increased max_iter ---")

# Initialize and train Poisson Regression with L2 regularization and increased max_iter
poisson_model_poly_reg = PoissonRegressor(alpha=1.0, max_iter=500)
poisson_model_poly_reg.fit(X_train_poly_scaled, y_train)

# Make predictions
y_pred_poly_reg = poisson_model_poly_reg.predict(X_test_poly_scaled)  # Corrected variable name

# Evaluate
mse_poly_reg = mean_squared_error(y_test, y_pred_poly_reg)
mae_poly_reg = mean_absolute_error(y_test, y_pred_poly_reg)
rmse_poly_reg = np.sqrt(mse_poly_reg)
r2_poly_reg = r2_score(y_test, y_pred_poly_reg)

print(f"Mean Squared Error (MSE): {mse_poly_reg:.2f}")
print(f"Mean Absolute Error (MAE): {mae_poly_reg:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse_poly_reg:.2f}")
print(f"R-squared (R2): {r2_poly_reg:.2f}")

# --- 3. Polynomial Poisson Regression (Degree 2) with PCA (n_components=10) and L2 Regularization ---
print("\n--- Polynomial Poisson Regression (Degree 2) with PCA (n_components=10) and L2 Regularization ---")

# Apply PCA
pca = PCA(n_components=10)
X_train_pca = pca.fit_transform(X_train_poly_scaled)
X_test_pca = pca.transform(X_test_poly_scaled)

# Initialize and train Poisson Regression with L2 regularization
poisson_model_pca_reg = PoissonRegressor(alpha=1.0)
poisson_model_pca_reg.fit(X_train_pca, y_train)

# Make predictions
y_pred_pca_reg = poisson_model_pca_reg.predict(X_test_pca)

# Evaluate
mse_pca_reg = mean_squared_error(y_test, y_pred_pca_reg)
mae_pca_reg = mean_absolute_error(y_test, y_pred_pca_reg)
rmse_pca_reg = np.sqrt(mse_pca_reg)
r2_pca_reg = r2_score(y_test, y_pred_pca_reg)

print(f"Mean Squared Error (MSE): {mse_pca_reg:.2f}")
print(f"Mean Absolute Error (MAE): {mae_pca_reg:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse_pca_reg:.2f}")
print(f"R-squared (R2): {r2_pca_reg:.2f}")


--- Polynomial Poisson Regression (Degree 2) - Increased max_iter ---
Mean Squared Error (MSE): 86.67
Mean Absolute Error (MAE): 6.94
Root Mean Squared Error (RMSE): 9.31
R-squared (R2): -0.39

--- Polynomial Poisson Regression (Degree 2) with L2 Regularization - Increased max_iter ---
Mean Squared Error (MSE): 86.67
Mean Absolute Error (MAE): 6.94
Root Mean Squared Error (RMSE): 9.31
R-squared (R2): -0.39

--- Polynomial Poisson Regression (Degree 2) with PCA (n_components=10) and L2 Regularization ---
Mean Squared Error (MSE): 62.36
Mean Absolute Error (MAE): 6.15
Root Mean Squared Error (RMSE): 7.90
R-squared (R2): 0.00
