## Regression: Ridge & Lasso Regressions on College Applications

We will use the College dataset to predict the number of applications ('Apps') received using the other variables in the College dataset. We will then use regularization to study their effects on our model. 

In [26]:
#Load data, rename cols, then check head
data = pd.read_csv('College.csv').copy()
data.set_index('Names', inplace = True)
data['Private'] = [1 if x=="Yes" else 0 for x in data['Private']]
data = data.rename(columns = {'Grad.Rate':'Grad_Rate',
                              'S.F.Ratio': 'S_F_Ratio',
                              'perc.alumni':'perc_alumni',
                              'Room.Board':'Room_Board',
                              'F.Undergrad':'F_Undergrad',
                              'P.Undergrad':'P_Undergrad'})
data.head()

Unnamed: 0_level_0,Private,Apps,Accept,Enroll,Top10perc,Top25perc,F_Undergrad,P_Undergrad,Outstate,Room_Board,Books,Personal,PhD,Terminal,S_F_Ratio,perc_alumni,Expend,Grad_Rate
Names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Abilene Christian University,1,1660,1232,721,23,52,2885,537,7440,3300,450,2200,70,78,18.1,12,7041,60
Adelphi University,1,2186,1924,512,16,29,2683,1227,12280,6450,750,1500,29,30,12.2,16,10527,56
Adrian College,1,1428,1097,336,22,50,1036,99,11250,3750,400,1165,53,66,12.9,30,8735,54
Agnes Scott College,1,417,349,137,60,89,510,63,12960,5450,450,875,92,97,7.7,37,19016,59
Alaska Pacific University,1,193,146,55,16,44,249,869,7560,4120,800,1500,76,72,11.9,2,10922,15


In [27]:
#Split the dataset into a training set and a test set
train, test = train_test_split(data, test_size=0.2, random_state=1)

Now we'll fit a linear model using `Stats Models` on the training set where the target variable is `Apps`, and see the test MSE obtained.

In [29]:
X = list(data.columns)
X.remove('Apps')

X_train = sm.add_constant(train[X])
y_train = train['Apps']
X_test = sm.add_constant(test[X])
y_test = test['Apps']
# X_t = (auto[['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin']])

# Fit the linear regression model
model = sm.OLS(y_train, X_train).fit()

model.summary()

  return ptp(axis=axis, out=out, **kwargs)


0,1,2,3
Dep. Variable:,Apps,R-squared:,0.923
Model:,OLS,Adj. R-squared:,0.92
Method:,Least Squares,F-statistic:,422.6
Date:,"Tue, 17 Oct 2023",Prob (F-statistic):,1.77e-321
Time:,14:37:29,Log-Likelihood:,-5220.3
No. Observations:,621,AIC:,10480.0
Df Residuals:,603,BIC:,10560.0
Df Model:,17,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-680.7685,493.753,-1.379,0.168,-1650.454,288.917
Private,-355.4791,164.818,-2.157,0.031,-679.166,-31.792
Accept,1.5759,0.046,34.022,0.000,1.485,1.667
Enroll,-0.7461,0.230,-3.246,0.001,-1.197,-0.295
Top10perc,50.2187,6.708,7.487,0.000,37.046,63.392
Top25perc,-14.9466,5.413,-2.761,0.006,-25.576,-4.317
F_Undergrad,0.0525,0.040,1.322,0.187,-0.025,0.130
P_Undergrad,0.0149,0.046,0.323,0.747,-0.076,0.106
Outstate,-0.1016,0.023,-4.438,0.000,-0.147,-0.057

0,1,2,3
Omnibus:,367.855,Durbin-Watson:,2.052
Prob(Omnibus):,0.0,Jarque-Bera (JB):,6382.346
Skew:,2.27,Prob(JB):,0.0
Kurtosis:,18.035,Cond. No.,187000.0


In [30]:
y_pred = model.predict(sm.add_constant(test[X]))

test_MSE = mean_squared_error(test['Apps'], y_pred)
test_MSE

  return ptp(axis=axis, out=out, **kwargs)


640045.0279060608

Now we'll fit a ridge regression using Kfold cross-validation to tune $\lambda$ over MSE.

In [35]:
alphas = np.logspace(2, 3, 50)  # 50 equally spaced lambda values from 10^2 to 10^3
kf = KFold(n_splits=10, shuffle=True, random_state=42)
ridge_cv = RidgeCV(alphas=alphas, cv=kf, scoring='neg_mean_squared_error', normalize=True)

# Fit the RidgeCV model on the training data
ridge_cv.fit(X_train, y_train)

RidgeCV(alphas=array([ 100.        ,  104.81131342,  109.8541142 ,  115.13953993,
        120.67926406,  126.48552169,  132.57113656,  138.94954944,
        145.63484775,  152.64179672,  159.98587196,  167.68329368,
        175.75106249,  184.20699693,  193.06977289,  202.35896477,
        212.09508879,  222.29964825,  232.99518105,  244.20530945,
        255.95479227,  268.26957953,  281.1768698 ,  294.70517026,
        308.88435965,  323.74575428,...
        372.75937203,  390.69399371,  409.49150624,  429.19342601,
        449.8432669 ,  471.48663635,  494.17133613,  517.94746792,
        542.86754393,  568.9866029 ,  596.36233166,  625.05519253,
        655.12855686,  686.648845  ,  719.685673  ,  754.31200634,
        790.60432109,  828.64277285,  868.51137375,  910.29817799,
        954.09547635, 1000.        ]),
        cv=KFold(n_splits=10, random_state=42, shuffle=True), normalize=True,
        scoring='neg_mean_squared_error')

In [36]:
# Get the selected lambda (alpha)
ridge_select = ridge_cv.alpha_

# Predict on the test set
predictions = ridge_cv.predict(X_test)

# Calculate Mean Squared Error (MSE) on the test set
test_MSE_ridge = mean_squared_error(y_test, predictions)
print(ridge_select)
print(test_MSE_ridge)

100.0
13146544.500145046


Comparing the ridge regression coefficients when using $\lambda = 0$ and the value for $\lambda$ given by `RidgeCV`, we can see how much lower the coefficients are.

In [38]:
ridge1 = Ridge(alpha=0, normalize=True)
ridge1.fit(X_train, y_train)
pred = ridge1.predict(X_test)
print (pd.Series(ridge1.coef_, index=X_train.columns))

const            0.000000
Private       -355.479131
Accept           1.575937
Enroll          -0.746077
Top10perc       50.218691
Top25perc      -14.946563
F_Undergrad      0.052458
P_Undergrad      0.014942
Outstate        -0.101565
Room_Board       0.181793
Books           -0.049853
Personal         0.020787
PhD             -8.851477
Terminal        -1.580112
S_F_Ratio       18.210318
perc_alumni     -0.134306
Expend           0.082175
Grad_Rate        9.008058
dtype: float64


In [39]:
ridge1 = Ridge(alpha=ridge_select, normalize=True)
ridge1.fit(X_train, y_train)
pred = ridge1.predict(X_test)
print (pd.Series(ridge1.coef_, index=X_train.columns))

const           0.000000
Private       -35.623380
Accept          0.014659
Enroll          0.035786
Top10perc       0.717930
Top25perc       0.660805
F_Undergrad     0.006368
P_Undergrad     0.010873
Outstate        0.000429
Room_Board      0.005126
Books           0.030147
Personal        0.010215
PhD             0.888409
Terminal        0.947271
S_F_Ratio       0.736176
perc_alumni    -0.202565
Expend          0.001861
Grad_Rate       0.376049
dtype: float64


Now we'll fit a lasso model using Kfold cross-validation to tune $\lambda$ over MSE.

In [41]:
alphas = np.logspace(2, 3, 50)  # 50 equally spaced lambda values from 10^2 to 10^3
kf = KFold(n_splits=10, shuffle=True, random_state=42)
lasso_cv = LassoCV(alphas=alphas, cv=kf, normalize=True)

# Fit the RidgeCV model on the training data
lasso_cv.fit(X_train, y_train)

LassoCV(alphas=array([ 100.        ,  104.81131342,  109.8541142 ,  115.13953993,
        120.67926406,  126.48552169,  132.57113656,  138.94954944,
        145.63484775,  152.64179672,  159.98587196,  167.68329368,
        175.75106249,  184.20699693,  193.06977289,  202.35896477,
        212.09508879,  222.29964825,  232.99518105,  244.20530945,
        255.95479227,  268.26957953,  281.1768698 ,  294.70517026,
        308.88435965,  323.74575428,...64803062,
        372.75937203,  390.69399371,  409.49150624,  429.19342601,
        449.8432669 ,  471.48663635,  494.17133613,  517.94746792,
        542.86754393,  568.9866029 ,  596.36233166,  625.05519253,
        655.12855686,  686.648845  ,  719.685673  ,  754.31200634,
        790.60432109,  828.64277285,  868.51137375,  910.29817799,
        954.09547635, 1000.        ]),
        cv=KFold(n_splits=10, random_state=42, shuffle=True), normalize=True)

In [42]:
# Get the selected lambda (alpha)
lasso_select = lasso_cv.alpha_

# Predict on the test set
predictions = lasso_cv.predict(X_test)

# Calculate Mean Squared Error (MSE) on the test set
test_MSE_lasso = mean_squared_error(y_test, predictions)
print(lasso_select)
print(test_MSE_lasso)

100.0
6685061.618833576


We can see a much lower MSE with the lasso regression than the ridge regression suggesting that is the model to use..