### Google.colab
Only execute this cell when use on google colab platform (colab).

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://github.com/Nak007/Stepwise">
    <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

In [None]:
# Mount with google drive.
from google.colab import drive
drive.mount('/content/dirve')

# Import other libraries required.
# All *.py will be stored under the following 
# location i.e. '/content/supervised_binning'.
!git clone 'http://github.com/Nak007/Stepwise.git'

# Change current directory to where *.py is stored.
%cd '/content/Stepwise'

## Example

### 1.1 Linear Regression

In [1]:
import pandas as pd, numpy as np
from sklearn.linear_model import (LinearRegression, 
                                  LogisticRegression, 
                                  Lasso, LassoCV, 
                                  Ridge, RidgeCV, 
                                  ARDRegression, 
                                  BayesianRidge)
from sklearn.datasets import make_regression
from sklearn.datasets import load_iris as data
import statsmodels.api as sm
from Stepwise import *

pd.options.display.float_format = "{:,.3f}".format

Generate a random regression problem.

In [2]:
reg_X, reg_y, coef = make_regression(n_samples=1000, 
                                     n_features=20, 
                                     n_informative=5, 
                                     bias=0.2, noise=0.6, 
                                     shuffle=True, 
                                     coef=True, 
                                     random_state=0)

In [3]:
columns = np.array(["X{}".format(str(n).zfill(2)) for n in range(1, reg_X.shape[1]+1)])
reg_X = pd.DataFrame(reg_X, columns=columns)

Using OLS from `statsmodels` and coefficients (`coef`) of the underlying linear model to obtain ANOVA towards comparison. 

In [4]:
valid_columns = list(columns[coef>0])
regsm_X = reg_X.copy()
regsm_X["intercept"] = 1
model = sm.OLS(reg_y, regsm_X[["intercept"] + valid_columns])
print(model.fit().summary()) 

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 1.409e+07
Date:                Tue, 09 Jan 2024   Prob (F-statistic):               0.00
Time:                        12:15:54   Log-Likelihood:                -914.94
No. Observations:                1000   AIC:                             1842.
Df Residuals:                     994   BIC:                             1871.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.2213      0.019     11.523      0.0

### 1.2 Forward selection
A forward-selection adds features one by one, based on which feature is the most statistically significant, to an empty estimator unitl there are no remaining statistically significant features i.e. less than or equal to defined p-value. Features never leave once added.

Create estimator (regressor).

In [5]:
regressor = LinearRegression(fit_intercept=True)
kwargs = dict(estimator=regressor, method="forward")
drop = ["alpha", "r2", "adj_r2", "mse", "method"]

In [6]:
model = StepwiseRegression(**kwargs).fit(reg_X, reg_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,0.222,0.019,11.594,0.0,0.185,0.26
X12,50.507,0.02,2492.734,0.0,50.467,50.547
X02,68.719,0.019,3608.591,0.0,68.682,68.757
X06,49.939,0.019,2637.142,0.0,49.902,49.976
X01,98.622,0.019,5072.61,0.0,98.584,98.661
X18,85.796,0.02,4358.374,0.0,85.757,85.835
X13,-0.042,0.02,-2.135,0.033,-0.081,-0.003


### 1.3 Backward elimination
A backward-elimination removes features that is the least statistically significant from the full estimator one by one until all features remaining in the equation is statistically significant i.e. greater than defined p-value. Features never return once removed.

In [7]:
kwargs.update({"method" : "backward"})
model = StepwiseRegression(**kwargs).fit(reg_X, reg_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,0.222,0.019,11.594,0.0,0.185,0.26
X01,98.622,0.019,5072.61,0.0,98.584,98.661
X02,68.719,0.019,3608.591,0.0,68.682,68.757
X06,49.939,0.019,2637.142,0.0,49.902,49.976
X12,50.507,0.02,2492.734,0.0,50.467,50.547
X13,-0.042,0.02,-2.135,0.033,-0.081,-0.003
X18,85.796,0.02,4358.374,0.0,85.757,85.835


### 1.4 Stepwise
Stepwise regression is a combination of forward selection and backward elimination. At each step, a new feature that satisfies
criterion i.e. p-value is added. Then a model gets evaluated. If one or more features are no longer passing p-value, they are  pruned. Then the process repeats until set of features does not change.

In [8]:
kwargs.update({"method" : "stepwise"})
model = StepwiseRegression(**kwargs).fit(reg_X, reg_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,0.222,0.019,11.594,0.0,0.185,0.26
X12,50.507,0.02,2492.734,0.0,50.467,50.547
X02,68.719,0.019,3608.591,0.0,68.682,68.757
X06,49.939,0.019,2637.142,0.0,49.902,49.976
X01,98.622,0.019,5072.61,0.0,98.584,98.661
X18,85.796,0.02,4358.374,0.0,85.757,85.835
X13,-0.042,0.02,-2.135,0.033,-0.081,-0.003


### 1.5 Regressors
- [OLS](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)
- [Lasso](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html)
- [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html)
- [Ridge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge)
- [RidgeCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV)
- [ARDRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ARDRegression.html#sklearn.linear_model.ARDRegression)
- [BayesianRidge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html#sklearn.linear_model.BayesianRidge)

In [9]:
# Collection of regressors
regressors = {"OLS"     : LinearRegression(), 
              "Lasso"   : Lasso(alpha=0.1,), 
              "LassoCV" : LassoCV(n_alphas=100, cv=3, random_state=0), 
              "Ridge"   : Ridge(alpha=0.1), 
              "RidgeCV" : RidgeCV(cv=3), 
              "ARD"     : ARDRegression(n_iter=20),
              "Bayes"   : BayesianRidge(n_iter=20)}

# Compile all keyword arguments
kwargs = dict()
for name,regressor in regressors.items():
    regressor.fit_intercept = True
    kwargs[name] = dict(estimator=regressor, method="stepwise")

# Actual coefficients
compare = pd.DataFrame({"feature" : columns[coef>0], 
                        "Actual"  : coef[coef>0]}).set_index("feature")

In [10]:
for key,value in kwargs.items():
    model = StepwiseRegression(**value).fit(reg_X, reg_y)
    results = model.results_[model.n_iters-1]
    results = pd.DataFrame(results).set_index("feature").drop(columns=drop)
    results = results[["coef"]].rename(columns={"coef":key})
    compare = compare.merge(results, how="outer", 
                            left_index=True, right_index=True)

In [11]:
compare

Unnamed: 0_level_0,Actual,OLS,Lasso,LassoCV,Ridge,RidgeCV,ARD,Bayes
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
X01,98.657,98.622,98.521,98.525,98.612,98.612,98.624,98.622
X02,68.735,68.719,68.612,68.617,68.712,68.712,68.718,68.719
X06,49.929,49.939,49.845,49.849,49.935,49.935,49.938,49.939
X12,50.499,50.507,50.391,50.396,50.501,50.501,50.507,50.507
X13,,-0.042,,,-0.042,-0.042,,-0.042
X18,85.794,85.796,85.693,85.697,85.787,85.787,85.796,85.796
intercept,,0.222,0.212,0.213,0.222,0.222,0.221,0.222


### 2.1 Logistics

In [12]:
cls_X = pd.DataFrame(data().data, columns=data().feature_names)
cls_y = (data().target==1).astype(int)

In [13]:
smcls_X = cls_X.copy()
smcls_X["intercept"] = 1
model = sm.Logit(cls_y, smcls_X).fit(maxiter=200, method='lbfgs')
print(model.summary())

                           Logit Regression Results                           
Dep. Variable:                      y   No. Observations:                  150
Model:                          Logit   Df Residuals:                      145
Method:                           MLE   Df Model:                            4
Date:                Tue, 09 Jan 2024   Pseudo R-squ.:                  0.2403
Time:                        12:16:13   Log-Likelihood:                -72.535
converged:                       True   LL-Null:                       -95.477
Covariance Type:            nonrobust   LLR p-value:                 2.603e-09
                        coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------
sepal length (cm)    -0.2454      0.650     -0.378      0.706      -1.518       1.028
sepal width (cm)     -2.7966      0.784     -3.569      0.000      -4.332      -1.261
petal length (cm)     1.

### 2.2 Forward selection

Create estimator (classifier).

In [14]:
classifier = LogisticRegression(random_state=0, 
                                fit_intercept=True, 
                                solver="lbfgs", 
                                penalty='none')
kwargs = dict(estimator=classifier)
drop = ["alpha", "gini", "ks", "method"]

In [15]:
kwargs.update({"method" : "forward"})
model = StepwiseRegression(**kwargs).fit(cls_X, cls_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,8.891,1.872,4.75,0.0,5.192,12.59
sepal width (cm),-3.222,0.637,-5.057,0.0,-4.482,-1.963


### 2.3 Backward elimination

In [16]:
kwargs.update({"method" : "backward"})
model = StepwiseRegression(**kwargs).fit(cls_X, cls_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,6.951,2.226,3.122,0.002,2.551,11.35
sepal width (cm),-2.957,0.667,-4.434,0.0,-4.275,-1.639
petal length (cm),1.125,0.462,2.436,0.016,0.212,2.038
petal width (cm),-2.615,1.082,-2.418,0.017,-4.752,-0.477


### 2.4.1 Stepwise

In [17]:
kwargs.update({"method" : "stepwise"})
model = StepwiseRegression(**kwargs).fit(cls_X, cls_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,8.891,1.872,4.75,0.0,5.192,12.59
sepal width (cm),-3.222,0.637,-5.057,0.0,-4.482,-1.963


### 2.4.1 Stepwise
In order to attain result like `backward`-elimination, some of the parameters must be adjusted.
- Relaxing entering criterion allows more features to be in equation e.g. `fwd_alpha` = 0.7
- Allow foward-selection to add more than one feature or unitl there are no remaining statistically significant features  i.e. `add_features` = None.

In [18]:
adj_kwargs = kwargs.copy()
adj_kwargs.update({"method"       : "stepwise", 
                   "fwd_alpha"    : 0.7, 
                   "add_features" : None})
model = StepwiseRegression(**adj_kwargs).fit(cls_X, cls_y)
results = model.results_[model.n_iters-1]
pd.DataFrame(results).set_index("feature").drop(columns=drop)

Unnamed: 0_level_0,coef,stderr,t,pvalue,lower,upper
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercept,6.951,2.226,3.122,0.002,2.551,11.35
sepal width (cm),-2.957,0.667,-4.434,0.0,-4.275,-1.639
petal width (cm),-2.615,1.082,-2.418,0.017,-4.752,-0.477
petal length (cm),1.125,0.462,2.436,0.016,0.212,2.038
