#   Boston Housing Data

- Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. 
- :Purpose: To predic housing price in Boston area
- Feature: 
1. CRIM           per capita crime rate by town 
2. ZN               proportion of residential land zoned for lots over 25,000 sq.ft. 
3. INDUS         proportion of non-retail business acres per town 
4. CHAS          Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 
5. NOX            nitric oxides concentration (parts per 10 million) 
6. RM              average number of rooms per dwelling 
7. AGE             proportion of owner-occupied units built prior to 1940 
8. DIS              weighted distances to five Boston employment centres 
9. RAD            index of accessibility to radial highways 
10. TAX             full-value property-tax rate per $10,000 
11. PTRATIO    pupil-teacher ratio by town 
12. B               1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 
13. LSTAT         % lower status of the population 

- target: MEDV Median housing price value of owner-occupied homes in $1000's 
- Repository: https://raw.githubusercontent.com/sesillim/ai/main/Housing.csv

# 1 How many observations are in the data?

In [None]:
import numpy as np
import pandas as pd
import sklearn


In [None]:
url = 'https://raw.githubusercontent.com/sesillim/ai/main/Housing.csv'
data= pd.read_csv(url)

In [None]:
data.keys()

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAS',
       'PTRATIO', 'B', 'LSTAT', 'MEDV'],
      dtype='object')

In [None]:
data.shape


(506, 14)

**There are 506 observations in the data for 14 variables with the orice of houses in Boston**

In [None]:
data.head()


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAS,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


# 2 Build a linear regression model. Print the coefficients of the model and assess the fit. 

In [None]:
# divide data into x and y
# split data into training and test (30%)
X =data.drop(['MEDV'], axis='columns')
y=data['MEDV']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state = 0)

In [None]:
# build model and train the model with training set
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)

LinearRegression()

In [None]:
# Baseline: Linear Regression
# possibility of overfitting
df=pd.DataFrame(lr.coef_,X.columns, columns=['Coefficient'])
print(" A linear regression with non-scaled data:")
print(df)
print("R-squared for a training set:", lr.score(X_train, y_train))
print("R-squared for a test set:", lr.score(X_test, y_test))
y_pred=lr.predict(X_test)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

 A linear regression with non-scaled data:
         Coefficient
CRIM       -0.121310
ZN          0.044466
INDUS       0.011342
CHAS        2.511246
NOX       -16.231253
RM          3.859068
AGE        -0.009985
DIS        -1.500270
RAD         0.242143
TAS        -0.011072
PTRATIO    -1.017753
B           0.006814
LSTAT      -0.486738
R-squared for a training set: 0.7645451026942549
R-squared for a test set: 0.6733825506400184
MAE: 3.609904060381819
MSE: 27.195965766883308
RMSE: 5.214975145375413


In [75]:
from sklearn.linear_model import Ridge
ridge10 = Ridge(alpha=10).fit(X_train,y_train)

In [78]:
df_ridge10=pd.DataFrame(ridge10.coef_,X.columns, columns=['Coefficient'])
print("ridge regresssion with lambda 10 & non-scaled data:")
print(df_ridge10)
print("R-squared for a training set:", ridge10.score(X_train,y_train))
print("R-squared for a test set:", ridge10.score(X_test,y_test))
y_pred=ridge10.predict(X_test)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

ridge regresssion with lambda 10 & non-scaled data:
         Coefficient
CRIM       -0.116008
ZN          0.048609
INDUS      -0.047175
CHAS        1.785961
NOX        -1.540350
RM          3.620640
AGE        -0.021636
DIS        -1.309680
RAD         0.211507
TAS        -0.012664
PTRATIO    -0.876398
B           0.007688
LSTAT      -0.524099
R-squared for a training set: 0.7561651276850343
R-squared for a test set: 0.656796352915132
MAE: 3.6459408352561233
MSE: 28.577023840824758
RMSE: 5.345748202153255


**Here, i have build a Linear Regression model and printed the coefficients of model with the assessment of the fit.**

In [84]:
data.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAS,PTRATIO,B,LSTAT,MEDV
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


# 3 Examine whether the scaling improves the fit of the linear regression model. You can choose any scaling method for the examination.

In [None]:
# MinMax Scalar
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Standard Scaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
lr_scale = LinearRegression()
lr_scale .fit(X_train_scaled,y_train)

LinearRegression()

In [None]:
df_scale=pd.DataFrame(lr_scale.coef_,X.columns, columns=['Coefficient'])
print("A linear regresssion with scaled data:")
print(df_scale)
print("R-squared for a training set:", lr_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", lr_scale.score(X_test_scaled,y_test))
y_pred=lr_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

A linear regresssion with scaled data:
         Coefficient
CRIM       -1.011901
ZN          1.050280
INDUS       0.079210
CHAS        0.618962
NOX        -1.873691
RM          2.705270
AGE        -0.279573
DIS        -3.097665
RAD         2.096900
TAS        -1.886063
PTRATIO    -2.261105
B           0.582643
LSTAT      -3.440498
R-squared for a training set: 0.7645451026942549
R-squared for a test set: 0.6733825506400195
MAE: 3.6099040603818127
MSE: 27.195965766883212
RMSE: 5.214975145375403


**After utilizing the feature scaling, it improved the fit of the linear regression model with 1 to 2 percents better output. and i have experienced Minmax and Standard Feature Scaling methods on them.** 




# 4 Examine whether implementing L2 regularization with scaling improves model performance. To optimize the hyperparameter, you have to compare at least 5 different hyperparameter values. 

In [87]:
from sklearn.linear_model import Ridge
ridge10_scale = Ridge(alpha=10).fit(X_train_scaled,y_train)


In [95]:
df_ridge10_scale=pd.DataFrame(ridge10_scale.coef_,X.columns, columns=['Coefficient'])
print("ridge regresssion with lambda 10 & scaled data:")
print(df_ridge10_scale)
print("R-squared for a training set:", ridge10_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", ridge10_scale.score(X_test_scaled,y_test))
y_pred=ridge10_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

ridge regresssion with lambda 10 & scaled data:
         Coefficient
CRIM       -0.933398
ZN          0.901779
INDUS      -0.130099
CHAS        0.653141
NOX        -1.554675
RM          2.780153
AGE        -0.318141
DIS        -2.751812
RAD         1.470846
TAS        -1.327215
PTRATIO    -2.183673
B           0.586609
LSTAT      -3.297412
R-squared for a training set: 0.7633167147255946
R-squared for a test set: 0.6674648670481825
MAE: 3.608315093238685
MSE: 27.68870465973234
RMSE: 5.262005763939483


In [85]:
from sklearn.linear_model import Ridge
ridge100_scale = Ridge(alpha=100).fit(X_train_scaled,y_train)

In [96]:
df_ridge100_scale=pd.DataFrame(ridge100_scale.coef_,X.columns, columns=['Coefficient'])
print("ridge regresssion with lambda 100 & scaled data:")
print(df_ridge100_scale)
print("R-squared for a training set:", ridge100_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", ridge100_scale.score(X_test_scaled,y_test))
y_pred=ridge100_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

ridge regresssion with lambda 100 & scaled data:
         Coefficient
CRIM       -0.715429
ZN          0.557659
INDUS      -0.462932
CHAS        0.675175
NOX        -0.669166
RM          2.685603
AGE        -0.324439
DIS        -1.408948
RAD         0.296800
TAS        -0.580408
PTRATIO    -1.838525
B           0.583344
LSTAT      -2.592446
R-squared for a training set: 0.7389220452609574
R-squared for a test set: 0.6318772220574332
MAE: 3.6863461673345483
MSE: 30.651927772241887
RMSE: 5.536418316225923


In [89]:
from sklearn.linear_model import Ridge
ridge1000_scale = Ridge(alpha=1000).fit(X_train_scaled,y_train)

In [97]:
df_ridge1000_scale=pd.DataFrame(ridge1000_scale.coef_,X.columns, columns=['Coefficient'])
print("ridge regresssion with lambda 1000 & scaled data:")
print(df_ridge1000_scale)
print("R-squared for a training set:", ridge1000_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", ridge1000_scale.score(X_test_scaled,y_test))
y_pred=ridge1000_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

ridge regresssion with lambda 1000 & scaled data:
         Coefficient
CRIM       -0.423946
ZN          0.375638
INDUS      -0.476137
CHAS        0.335314
NOX        -0.341437
RM          1.228876
AGE        -0.305634
DIS        -0.081153
RAD        -0.256937
TAS        -0.441192
PTRATIO    -0.873286
B           0.370682
LSTAT      -1.115034
R-squared for a training set: 0.5371333525433009
R-squared for a test set: 0.4612028177403027
MAE: 4.677071843748582
MSE: 44.863217665624404
RMSE: 6.698001020127155


In [91]:
from sklearn.linear_model import Ridge
ridge5_scale = Ridge(alpha=5).fit(X_train_scaled,y_train)

In [98]:
df_ridge5_scale=pd.DataFrame(ridge5_scale.coef_,X.columns, columns=['Coefficient'])
print("ridge regresssion with lambda 5 & scaled data:")
print(df_ridge5_scale)
print("R-squared for a training set:", ridge5_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", ridge5_scale.score(X_test_scaled,y_test))
y_pred=ridge5_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

ridge regresssion with lambda 5 & scaled data:
         Coefficient
CRIM       -0.967683
ZN          0.966353
INDUS      -0.045437
CHAS        0.639325
NOX        -1.697762
RM          2.750044
AGE        -0.303631
DIS        -2.913691
RAD         1.729902
TAS        -1.550811
PTRATIO    -2.219011
B           0.584446
LSTAT      -3.364826
R-squared for a training set: 0.7641599771836047
R-squared for a test set: 0.6703585664250096
MAE: 3.6099315279962014
MSE: 27.447759329511776
RMSE: 5.23906092057649


In [93]:
from sklearn.linear_model import Ridge
ridge1_scale = Ridge(alpha=1).fit(X_train_scaled,y_train)

In [99]:
df_ridge1_scale=pd.DataFrame(ridge1_scale.coef_,X.columns, columns=['Coefficient'])
print("ridge regresssion with lambda 1 & scaled data:")
print(df_ridge1_scale)
print("R-squared for a training set:", ridge1_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", ridge1_scale.score(X_test_scaled,y_test))
y_pred=ridge1_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

ridge regresssion with lambda 1 & scaled data:
         Coefficient
CRIM       -1.002007
ZN          1.031479
INDUS       0.049805
CHAS        0.623750
NOX        -1.835263
RM          2.715728
AGE        -0.285448
DIS        -3.058900
RAD         2.011591
TAS        -1.806511
PTRATIO    -2.251977
B           0.582930
LSTAT      -3.424557
R-squared for a training set: 0.7645258699709746
R-squared for a test set: 0.6727737684528233
MAE: 3.6101074898564725
MSE: 27.24665632109248
RMSE: 5.2198329782755


**First, here I have build L2 Regression Ridge model and implemented feature scaling with it and compared 5 hyperparameters values with alpha[10,100,1000,5,1] respectively, which i have found that using scaling is beneficial and decreasing alpha rate impoves the output accuracy which error rate decreases.**

# 5. Examine whether implementing L1 regularization with scaling improves model performance. To optimize the hyperparameter, you have to compare at least 5 different hyperparameter values.

In [101]:
from sklearn.linear_model import Lasso
lasso01_scale = Lasso(alpha=0.1).fit(X_train_scaled,y_train)

In [103]:
df_lasso01_scale=pd.DataFrame(lasso01_scale.coef_,X.columns, columns=['Coefficient'])
print("lasso regresssion lambda with 0.1 & scaled data:")
print(df_lasso01_scale)
print("R-squared for a training set:", lasso01_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", lasso01_scale.score(X_test_scaled,y_test))
y_pred=lasso01_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

lasso regresssion lambda with 0.1 & scaled data:
         Coefficient
CRIM       -0.713303
ZN          0.701583
INDUS      -0.069505
CHAS        0.603998
NOX        -1.444227
RM          2.834286
AGE        -0.089646
DIS        -2.345932
RAD         0.639348
TAS        -0.657379
PTRATIO    -2.163676
B           0.472380
LSTAT      -3.504427
R-squared for a training set: 0.759129270664182
R-squared for a test set: 0.6599647116559878
MAE: 3.6229879738141193
MSE: 28.313208860876916
RMSE: 5.321015773409896


In [None]:
from sklearn.linear_model import Lasso
lasso01 = Lasso(alpha=0.1).fit(X_train,y_train)

In [104]:
df_lasso01=pd.DataFrame(lasso01.coef_,X.columns, columns=['Coefficient'])
print("lasso regresssion lambda with 0.1 & non-scaled data:")
print(df_lasso01)
print("R-squared for a training set:", lasso01.score(X_train,y_train))
print("R-squared for a test set:", lasso01.score(X_test,y_test))
y_pred=lasso01.predict(X_test)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

lasso regresssion lambda with 0.1 & non-scaled data:
         Coefficient
CRIM       -0.113118
ZN          0.047251
INDUS      -0.039925
CHAS        0.964789
NOX        -0.000000
RM          3.722896
AGE        -0.021431
DIS        -1.233704
RAD         0.204690
TAS        -0.012944
PTRATIO    -0.852690
B           0.007958
LSTAT      -0.523924
R-squared for a training set: 0.7531274572554778
R-squared for a test set: 0.6532086050344972
MAE: 3.6392670911559044
MSE: 28.875759467880663
RMSE: 5.373616981873631


In [None]:
from sklearn.linear_model import Lasso
lasso0001_scale = Lasso(alpha=0.001).fit(X_train_scaled,y_train)

In [105]:
df_lasso0001_scale=pd.DataFrame(lasso0001_scale.coef_,X.columns, columns=['Coefficient'])
print("lasso regresssion lambda with 0.001 & scaled data:")
print(df_lasso0001_scale)
print("R-squared for a training set:", lasso0001_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", lasso0001_scale.score(X_test_scaled,y_test))
y_pred=lasso0001_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

lasso regresssion lambda with 0.001 & scaled data:
         Coefficient
CRIM       -1.009102
ZN          1.046310
INDUS       0.069836
CHAS        0.619218
NOX        -1.867103
RM          2.706109
AGE        -0.278011
DIS        -3.091935
RAD         2.078839
TAS        -1.868239
PTRATIO    -2.259217
B           0.581396
LSTAT      -3.440498
R-squared for a training set: 0.7645442937237866
R-squared for a test set: 0.6732910521770625
MAE: 3.609961304106649
MSE: 27.203584432301042
RMSE: 5.215705554601509


In [106]:
from sklearn.linear_model import Lasso
lasso1_scale = Lasso(alpha=1).fit(X_train_scaled,y_train)

In [109]:
df_lasso1_scale=pd.DataFrame(lasso1_scale.coef_,X.columns, columns=['Coefficient'])
print("lasso regresssion lambda with 1 & scaled data:")
print(df_lasso1_scale)
print("R-squared for a training set:", lasso1_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", lasso1_scale.score(X_test_scaled,y_test))
y_pred=lasso1_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

lasso regresssion lambda with 1 & scaled data:
         Coefficient
CRIM       -0.034090
ZN          0.000000
INDUS      -0.000000
CHAS        0.000000
NOX        -0.000000
RM          2.675475
AGE        -0.000000
DIS        -0.000000
RAD        -0.000000
TAS        -0.119743
PTRATIO    -1.784836
B           0.002440
LSTAT      -3.404284
R-squared for a training set: 0.6927580317165543
R-squared for a test set: 0.5999442961470397
MAE: 3.957998413357712
MSE: 33.31083886715502
RMSE: 5.771554285212521


In [110]:
from sklearn.linear_model import Lasso
lasso10_scale = Lasso(alpha=10).fit(X_train_scaled,y_train)

In [111]:
df_lasso10_scale=pd.DataFrame(lasso10_scale.coef_,X.columns, columns=['Coefficient'])
print("lasso regresssion lambda with 10 & scaled data:")
print(df_lasso10_scale)
print("R-squared for a training set:", lasso10_scale.score(X_train_scaled,y_train))
print("R-squared for a test set:", lasso10_scale.score(X_test_scaled,y_test))
y_pred=lasso10_scale.predict(X_test_scaled)
from sklearn import metrics
print("MAE:", metrics.mean_absolute_error(y_test,y_pred))
print("MSE:", metrics.mean_squared_error(y_test,y_pred))
print("RMSE:", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

lasso regresssion lambda with 10 & scaled data:
         Coefficient
CRIM            -0.0
ZN               0.0
INDUS           -0.0
CHAS             0.0
NOX             -0.0
RM               0.0
AGE             -0.0
DIS              0.0
RAD             -0.0
TAS             -0.0
PTRATIO         -0.0
B                0.0
LSTAT           -0.0
R-squared for a training set: 0.0
R-squared for a test set: -0.0060197319476869016
MAE: 6.6181274159976216
MSE: 83.76673764512785
RMSE: 9.152417038418204


***Here I have build L1 Regression Lasso model and implemented feature scaling with it and one without and compared 5 hyperparameters values with alpha[0.1,0.1,0.001,1,10] respectively, which i have found that using scaling is beneficial and decreasing alpha rate impoves the output accuracy which error rate decreases. By the way for our 506 observation dataset, using Ridge regression model gives much better result and accuracy. ***