# Model Selection
Model selection is one of the important subjects in statistical modeling. We have the possibility of including as many as features, all of them would not improve the model accuracy. It is important to know which features are improving the model. There are two different approaches.

- Model selection by fitting a model and using a statistical measure such as AIC and BIC.
    - Forward stepwise seleciton
    - Backward stepwise selection

- Using a shrinkage method 












# Load file
Commonly two libraries are used to load a csv files.
- numpy function `np.loadtext` and `np.genfromtext ` 
- pandas function `pd.read_csv`

Here we prefer using pandas

In [None]:
import pandas as pd
path='data/'
filename = path+'Credit.csv'
credit = pd.read_csv(filename)

In [None]:
credit.head()


In [None]:
credit = credit[['Balance','Income', 'Limit', 'Rating', 'Cards', 'Age', 'Education']]

In [None]:
credit

In [None]:
from pandas.plotting import scatter_matrix
%matplotlib inline
scatter_matrix(credit, alpha=0.2);

In [None]:
# Manual Forward stepwise selection only on 'Income' 'Limit' and 'Rating'

# model size = 1 try models with only one dimension first. Choose the best attribute.
#Go to the next step while keeping the best variable in the model.

from sklearn.linear_model import LinearRegression
y = credit['Balance'].values
X = credit[['Income']].values
lr = LinearRegression()
lr.fit(X,y)
score1 = lr.score(X,y)


y = credit['Balance'].values
X = credit[['Limit']].values
lr = LinearRegression()
lr.fit(X,y)
score2 = lr.score(X,y)

y = credit['Balance'].values
X = credit[['Rating']].values
lr = LinearRegression()
lr.fit(X,y)
score3 = lr.score(X,y)

# score3>score2>score1 so Rating enters
# model size = 2 

y = credit['Balance'].values
X = credit[['Rating', 'Income']].values
lr = LinearRegression()
lr.fit(X,y)
score31 = lr.score(X,y)

y = credit['Balance'].values
X = credit[['Rating', 'Limit']].values
lr = LinearRegression()
lr.fit(X,y)
score32 = lr.score(X,y)

# score31>score32 so Income enters as the second attribute to the model.


In [None]:
print (score31, score32)

In [None]:
from sklearn.linear_model import Ridge
import numpy as np
y = credit['Balance'].values
X = credit[['Income', 'Limit', 'Rating', 'Cards', 'Age', 'Education']].values
# zero penalization gives the least squares
rr = Ridge(alpha=0, normalize=True)
rr.fit(X, y) 

In [None]:
X_pred = np.array([15, 3000, 300, 2, 34, 16]).reshape(1,6)
rr.predict(X_pred)


In [None]:
rr = Ridge(alpha=10, normalize=True)
rr.fit(X, y) 
rr.predict(X_pred)


# Cross-validation
Tuning a good penalization constant is like tuning finding the model size is linear regression. Let's use cross-validation.

In [None]:
from sklearn.linear_model import RidgeCV
alpha_values = np.linspace(0.0001, 0.01, num= 100)
rrcv = RidgeCV(alphas=alpha_values, normalize = True, store_cv_values = True)
rrcv.fit(X, y)


# Visualize
It is easy to track cross validation error versus the penalization parameter.

In [None]:
cv_values = np.sum(rrcv.cv_values_, axis=0)

In [None]:
import matplotlib.pyplot as plt
plt.plot(alpha_values, cv_values, 'or');

In [None]:
rrcv.predict(X_pred)


# Lasso
Fitting lasso is similar to Ridge. 



In [None]:
from sklearn.linear_model import Lasso
lr = Lasso(alpha = 0.1, normalize = True)
lr.fit(X,y)
lr.predict(X_pred)

# Cross validation
The penalization value needs to be estimated. Like the ridge regression we use crossvalidation to estimate it.

In [None]:
from sklearn.linear_model import LassoCV

lrcv = LassoCV(alphas = alpha_values, cv = 10, normalize = True)
lrcv.fit(X, y)


In [None]:
lrcv.alpha_

In [None]:
lrcv.predict(X_pred)

# LARS
Least angle regression and selection is a computationally fast method for finding the solution path of coefficients for a varying penalization constant alpha

In [None]:
import matplotlib.pyplot as plt
from sklearn.linear_model import lars_path
alphas,_,coefs = lars_path(X, y, method='lasso', verbose=True)

In [None]:
alphas[1]

In [None]:
coefs[:,1]