## Linear models do not automatically capture interaction effects between input features. 


#### Dataset : Bikeshare 
#### Goal : Approximate the true bike rentals demand


We use regression analysis to understand the relationships, patterns, and causalities in data. Often we are interested in understanding the impacts that changes in the dependent variables have on our outcome of interest.

### What is conditional dependence ?
it describes the behavior of a specific variable by keeping the others fixed.<br>

#### In linear models, the target value is modeled as a linear combination of the features.
#### The coefficients in multiple linear models represent the relationship between the given feature, `X` and the target,`y` assuming that all the other features remain constant.

### What is marginal dependence ?
it describes the behavior of a specific variable without keeping the others fixed.<br>
For an example : Features `Sex`, `Age`, `Education`. Target: `Wage` <br>
when we plot `Age` vs `wage` what we see is a marginal dependence. 

Some features may not be a good predictor of the target variable all by itself, but in presence of other features it can help us model the target-variable.

### We can use the `PolynomialFeatures` class to model the interaction explicitly.


In [85]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, MinMaxScaler, SplineTransformer, PolynomialFeatures

from sklearn.compose import ColumnTransformer
from sklearn.model_selection import TimeSeriesSplit, cross_validate, cross_val_score
from sklearn.pipeline import make_pipeline, FunctionTransformer, FeatureUnion

from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_poisson_deviance, mean_absolute_percentage_error, median_absolute_error

from sklearn.linear_model import Ridge, PoissonRegressor, RidgeCV
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor

from feature_engine.creation import CyclicalFeatures

## Bike Sharing Demand dataset

In [4]:
from sklearn.datasets import fetch_openml

bike_sharing = fetch_openml(
    "Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas"
)
df = bike_sharing.frame

# The target of the prediction problem is the absolute count of bike rentals on a hourly basis:

# Let us rescale the target variable (number of hourly bike rentals) 
# to predict a relative demand so that the mean absolute error is more easily interpreted
#  as a fraction of the maximum demand.

## Feature and target
y = df['count']/df['count'].max()

y_count = df.pop('count')

X = df

X.head()

Unnamed: 0,season,year,month,hour,holiday,weekday,workingday,weather,temp,feel_temp,humidity,windspeed
0,spring,0,1,0,False,6,False,clear,9.84,14.395,0.81,0.0
1,spring,0,1,1,False,6,False,clear,9.02,13.635,0.8,0.0
2,spring,0,1,2,False,6,False,clear,9.02,13.635,0.8,0.0
3,spring,0,1,3,False,6,False,clear,9.84,14.395,0.75,0.0
4,spring,0,1,4,False,6,False,clear,9.84,14.395,0.75,0.0


In [5]:
# "heavy_rain" cateory is appearing only 3 times in our data, so lets add that to "rain" category
X['weather'] = X['weather'].replace(to_replace='heavy_rain', value='rain')
X['weather'].value_counts()

clear    11413
misty     4544
rain      1422
Name: weather, dtype: int64

## Feature Engineering and Modeling pipeline

In [32]:
# cross validation split

tscv = TimeSeriesSplit(
    n_splits=5,
    gap=48, #2day gap
    max_train_size=10000,
    test_size=1000
)

def custom_scoring(est, x, y):
    y_pred = est.predict(x)
    mask = y_pred>0
    mae = mean_absolute_error(y[mask], y_pred[mask])
    mse = mean_squared_error(y[mask], y_pred[mask])
    mpd = mean_poisson_deviance(y[mask], y_pred[mask])
    return {'mean_absolute_error': mae, 'mean_squared_error':mse, 'mean_poisson_deviance':mpd}
    


def evaluate_pipeline(pipe, X, y, cv, ):

    score = cross_validate(pipe, X, y, cv=cv, scoring=custom_scoring)
    
    mae = np.mean(score['test_mean_absolute_error'])
    mse = np.mean(score['test_mean_squared_error'])
    mpd = np.mean(score['test_mean_poisson_deviance'])
    
    result = f'Mean absolute error : {mae}\nMean squared error : {mse}\nMean poisson deviance : {mpd}'

    print(result)
    return score  

## One hot encode the categorical features 

In [7]:
categorical_columns = [col for col in X.select_dtypes(include='category')]

In [21]:
categories = [list(X[col].value_counts().index) for col in X.select_dtypes(include='category')]

In [24]:
categorical_transformer = OneHotEncoder(categories=categories)

## Bsplines to capture the periodicity

In [48]:
def bsplines(period, n_knots, degree=3):
    knots = np.linspace(0, period, n_knots)[:, np.newaxis]
    return SplineTransformer(n_knots=n_knots, degree=degree, knots=knots, extrapolation='periodic')

In [49]:
hour_transformer = bsplines(period=24, n_knots=12, degree=3)
month_transformer = bsplines(period=12, n_knots=6, degree=3)
weekday_transformer = bsplines(period=7, n_knots=4, degree=3)

## Marginal Feature pipeline

In [50]:
marginal_features = ColumnTransformer(transformers=[
    ('category', categorical_transformer, categorical_columns),
    ('hour', hour_transformer, ['hour']),
    ('month', month_transformer, ['month']),
    ('weekday', weekday_transformer, ['weekday']),
], remainder=MinMaxScaler())

## Feature interaction

#### Use the PolynomialFeatures class on coarse grained spline encoded `hours` to model the “`workingday`”`/`”`hours`” interaction explicitly without introducing too many new variables:

In [51]:
hour_workday_transformer = ColumnTransformer(transformers=[
    ('hour', bsplines(period=24, n_knots=12, degree=3), ['hour']),
    ('workingday', FunctionTransformer(lambda x : x=="True"), ['workingday'])
])


hour_workday_interaction = make_pipeline(hour_workday_transformer, PolynomialFeatures(degree=2, interaction_only=True, include_bias=False))

## Use FeatureUnion to combine marginal and interaction features

In [52]:
Feature_pipeline = make_pipeline(
    FeatureUnion([
        ('marginal', marginal_features),
        ('interaction', hour_workday_interaction)
    ])
)

### Modeling pipeline

In [53]:
def build_model_pipe(model):
    return make_pipeline( Feature_pipeline, model)

## Linear Model : Ridge

In [54]:
ridge = RidgeCV(alphas=np.logspace(-6,6,25))
ridge_pipe = build_model_pipe(model=ridge)
ridge_score = evaluate_pipeline(ridge_pipe, X, y, tscv)

Mean absolute error : 0.07420818793206316
Mean squared error : 0.010042563852702264
Mean poisson deviance : 0.039851909144203765


## Linear Model : Poisson

In [57]:
poisson = PoissonRegressor(alpha=0.0001)
poisson_pipe = build_model_pipe(poisson)
poisson_score = evaluate_pipeline(poisson_pipe, X, y, tscv)

Mean absolute error : 0.05229099306908539
Mean squared error : 0.006598711377222193
Mean poisson deviance : 0.020059090453453002


## Model prediction 

In [63]:
all_data_split = list(tscv.split(X, y))

In [92]:
def visualize_prediction(model, datasplit=4, is_poisson=False, title='Model Performance'):
    train_id, test_id = all_data_split[datasplit]
    pipe = build_model_pipe(model)
    pipe.fit(X.iloc[train_id], y.iloc[train_id])

    y_pred = pipe.predict(X.iloc[test_id])
    y_true = y.iloc[test_id]
    plt.figure(figsize=(15,4))
    plt.plot(y_true.reset_index(drop=True), c='r', alpha=0.6, linewidth=2)
    plt.plot(y_pred, c='g', alpha=0.6, linewidth=3)
    plt.xticks([])

    mape = mean_absolute_percentage_error(y_true, y_pred)
    mad = median_absolute_error(y_true, y_pred)
    mae = mean_absolute_error(y_true, y_pred)
    
    score = f'''{title} performance on test split : {datasplit}
    \nMean Absolute Percentage Error : {mape :0.2f}
    \nMedian Absolute Error : {mad :0.2f}
    \nMean Absoulte Error  : {mae :0.2f}'''

    if is_poisson:
        mpd = mean_poisson_deviance(y_true, y_pred)
        score += f'\nMean Poisson Deviance : {mpd :0.2f}'


    plt.title(score)


In [95]:
visualize_prediction(ridge, datasplit=4, title='Ridge Model')

<img src='./plots/Ridge-model-improvement-with-interaction-feats-testsplit-4.png'>

In [96]:
visualize_prediction(poisson, datasplit=4, title='Ridge Model')

<img src='./plots/Poisson-model-improvement-with-interaction-feats-performance-on-testsplit-4.png'>